Message boards : Number crunching : Frozen WU ???
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Mar 05 Posts: 64 Credit: 790,577 RAC: 0 |
This WU - http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=1921937 reached 100% completion half-an-hour ago but my hard disk is still working franticly. BOINC manager states WU is still running, CPU time 1462.51.33 , Progress 100% and To Completion = 0.00 . No other messages appear in BOINC manager. Any ideas anyone? (to stop the hard disk activity, I\'ve now suspended the WU) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Possibly something graphical crashed it. The disk activity is suspicious, as is 100% completed, which usually means that you have an orphan process, and BOINC, losing contact with the program, thinks the model is finished. While it\'s suspended, turn off Network access, make a backup to a new location, (don\'t overwrite previous backups! ), then re-boot. Let it run with Network access still suspended, and see if it works now. If not restore a previous backup. |
Send message Joined: 5 Mar 05 Posts: 64 Credit: 790,577 RAC: 0 |
Thanks for the prompt reply but the WU is still not finalising and uploading. Continuous hard disk activity still there so I\'ve suspended again. BOINC Manager now states CPU time 1462.51.33, Progress 0% and To Completion = 1618.48.57. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Perhaps I was wrong about the state of the model. If it was almost complete, then the hard disk activity could just be the half hour or so of zipping the data, and preparing the final uploads. In which case, just leave it for a \"while\". (Hour or 2?) The figures that you now quote are in line with a model starting from a checkpoint, but not finished re-loading the data so as to start again from there. Which usually only takes a few seconds. (5-6 ?) Also note that sulphur models aren\'t of much use to the researchers now, but they ARE (quietly) desperate for another 2-3 thousand more TCMs ASAP. PS Perhaps your hd is starting to fail, and is trying to get data off a bad area. |
Send message Joined: 5 Mar 05 Posts: 64 Credit: 790,577 RAC: 0 |
Thanks Les - I was probably being a little bit impatient! Anotherr 20 mins of run-time this morning and it finalised perfectly - http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1221158 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Congratulations. It\'s nice to see all those graphs on the model\'s page. |
Send message Joined: 25 Nov 05 Posts: 11 Credit: 870,090 RAC: 0 |
Well done John! |
Send message Joined: 5 Mar 05 Posts: 64 Credit: 790,577 RAC: 0 |
Now I know what to expect next time I run a CPDN WU. I\'ll be taking a break from CPDN for a few weeks but I\'ll return refreshed and ready to go on a new WU. Is it true the new TCM models take twice as long as a Sulphur model? The sulphur model I\'ve just completed was around 7 months in crunching so I would be in danger of going over the deadline if it is still 1 year...... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Yes, about that. Single phase, 160 model years. But the deadline is just there because something has to be. 3 months on a P4 3.2 GHz machine running 24/7 with no other projects. The new models are different to slab / sulphur. They upload data every year, (early December), with a bigger upload every 10 years, and a very big restart dump every 40 years. And they don\'t leave any data on the hd after they finish. Provided they don\'t crash. |
Send message Joined: 5 Mar 05 Posts: 64 Credit: 790,577 RAC: 0 |
Thanks for the info, Les! (I like this place; somewhere finally where the admins care and respond to the crunchers. A few of the projects, which shall remain nameless, do not show this level of support for crunchers.) I won\'t be able to run CPDN exclusively 24/7 but I will certainly give it a good proportion of my CPU output in future. |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Is there really such a hurry? If there is, I could alter my preferences a bit... I can\'t offer 24/7, but if I let this box crunch 100% of CPDN for a while 12 hours a day or so should be realistic... which seems to be quite okay as my box takes only about 1100 CPU secs per timestep which is reasonably quick compared to what I\'ve seen out there. Shouldn\'t be slower than your example P4 really. So, what I want to say is ^^ if it is important for the science just tell me and you\'ll at least get one model back a bit quicker ;-) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It was a slightly tongue-in-cheek hint to the thousands of part time crunchers who are running a dozen or more projects at once, with cpdn getting maybe one hour a day. (I made this up too. ) But the Transient Coupled Models have been available since March, and some people are still working on slab and sulphur models. With over 60 thousand computers crunching away, it was hoped that results would be further along by now. Also, in the Seasonal Attribution Project, which closed for the primary purpose at the end of October, there are still over 10,000 models still out there. These will get used by the secondary researchers, but only if they get returned by, perhaps, the end of this year. So the climate projects aren\'t getting much serious attention from a lot of the people running them. I\'m just someone interested in the climate projects to the extent that I\'ve been helping on the 3 climate help boards for ages, but not connected in any way with the core team, so I can talk with some experience about problems running the programs, but with no authority whatsoever on what is wanted /hoped for / needed by the project people. It was recently said by one of the core people that they had been hoping for about 5,000 models by now. So, just coming from me, yes it would be nice if you could \"move it up a gear\". And anyone else. Thank you. PS I\'ll soon be back here myself, after 4 months on a spinup model, and then 7 months on SAP models. Another couple of days to let this last SAP get a good start, and then I\'m also going to start a BBC TCM and a cpdn TCM, to synchronise my cpids. When the SAP finishs in about 10 days or so, it\'ll be 2 TCMs at full speed. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
...Also, in the Seasonal Attribution Project, which closed for the primary purpose at the end of October, there are still over 10,000 models still out there. These will get used by the secondary researchers, but only if they get returned by, perhaps, the end of this year... BoincStats shows that only 509 PCs returned a trickle in the last 24 hours, so I\'d guess that probably indicates that a lot of the \'models in progress\' are actually \'lost in action\'. Of course it\'s impossible to say exactly how many since some PCs may have multiple viable models on them. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Sounds fair enough, Les. I\'ll see what I can do, although I\'m not one of the people with \"one thousand projects\" or so (actually, this PC is shared 50/50 between Einstein and CPDN, with SETI only on my Notebook and HashClash inactive for the time being) but I try to help where it is really needed. And whereas in projects like SETI, Einstein or Rosetta even old P3s or so can be used and show fair performance in the long run, CPDN seems to have high CPU/RAM requirements which probably prevent some interested users from joining. They did for me before I got this box, so now I have the power, why not use it here? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I\'ve been watching the stats for SAP more closely since the \'main\' closure on October 31, and the number for \"Results in progress\" is dropping quite fast now. It was only going to be a temporary thing, but I\'ve started writing down the numbers. Unfortunately without dates at the start. (I\'m probably bored. ) They started at 11,331 at about the closure, and are now at 10,441. And quite a few crunchers around \'my area\' of the credits are starting to \'drop out\'. But a few have started a new model recently, so several people have an extra model or two \'up their sleeve\'. Another 24 hours, and I think that I\'ll see 4 or 5 more drop out. *************** Yes the climate projects do have rather a \'hi-tech\' requirement, but one can\'t expect much success with running a supercomputer program on a low end desktop. It\'s always a surprise when people do manage it with low ram, for instance. But all of the simple climate models have been run, and now the researchers are interested in looking \'deeper\' into weather and climate. So it\'s just as well that people are starting to upgrade to the more modern, more powerfull computers, such as the \'newish\' Core Duo. It looks like these will be needed before long. (They\'re fast.) |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Hey, I wasn\'t complaining ;-) rather the opposite... it was meant like \"you need it, I\'ve got it now\"... I know you have good reasons for making the app so \"hightech\". What I wanted to say that at other projects, it\'s easier for other people to contribute, so it\'s not so bad if I do a bit less there for a while. Whereas here, it\'s really limited to those with good computers. And yes, trying to run climate projects with slower PCs SUCKS. I tried it once on my old laptop... 496 MB of memory (at best -.- shared RAM graphics card) and a Celeron M processor at 1.3 GHz of clock speed... an okay machine for most of the other projects (I\'ve been doing SETI, Einstein and HashClash on it... okay, WUs take longer, but apart from that they all ran fine) so I thought I\'d try BBC climate change ^^ yes, I can be a bit extreme if I find something really interesting, and besides I don\'t have a very high opinion of people saying \"it won\'t run\" because often it will. But this time, it didn\'t. After the third major crash I gave it up... No idea if it was too little memory, if the memory was just too slow (133 MHz, I know it\'s pathetic) or if my CPU played a role as well... only good thing is it didn\'t overheat ^^ no problems there, but after this experiment I really can\'t advise people under the minimum requirements to run these projects. I\'m glad I\'m back now with something faster, though :-D |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Sorry, Annika, just commiserating, and explaining to anyone else out there who might show up here. Your offer of extra time, is appreciated. Thanks. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
I\'ve been watching the stats for SAP more closely since the \'main\' closure on October 31, and the number for \"Results in progress\" is dropping quite fast now. I\'ve been writing down the \'completed models\' figures since around March, and the \'WUs in queue\' figures between August-October :-) Some people have picked up reissued models where the first generation recently crashed, although I don\'t think there are many of those available (probably a handful per day). This graph from netsoft online shows the project activity quite well I thought: Note the very steep drop after work ran out! I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Tis indeed a nice graph. Number is now at 10,398 and falling. |
©2024 cpdn.org