Message boards : Number crunching : Output file .... absent
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Aug 05 Posts: 4 Credit: 2,613,064 RAC: 2,241 |
Hi everybody, I hope this is the correct forum. It seems that one of the two work units is completed without packing results to send. I found this message on log: 08/05/2011 03:04:17 climateprediction.net Computation for task hadcm3n_p4eo_1900_40_007223296_1 finished 08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_1.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent 08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_2.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent 08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_3.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent 08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_4.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent Is there a way to produce the results files and not throw away all the work done? |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Unfortunately, the model crashed before creating the first Zip file. The list of absent files is just BOINC recording that the model has finished before generating all the files it was expecting to send back to the project. (There is one file per decade, so four files in total for a 40-year model.) The stderr log on the task page shows a lot of BOINC quit requests. Perhaps one of the shutdowns caused the crash. It is a good idea to close down BOINC manually, particularly with two large HADCM3N models running. |
Send message Joined: 9 Aug 05 Posts: 4 Credit: 2,613,064 RAC: 2,241 |
Ok for the future work, but in other words that work unit is lost. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Yes, I'm afraid the Work Unit is lost. In future you might try making backups every few days. This will allow you to restore a crashed WU and go on with only minimal loss of time. Info on how to make a backup and do a restore can be found at the top of the “Number Crunching” section in the “information about running the climate models” thread. Unfortunately, the backups have to be made before the WU’s crashes, so there is no way to fix the one you just lost. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
After finishing a hadam3p WU which took more than 200 hours on my Linux box I downloaded a hadcm3n WU which lasted less than a minute and ended with "output file absent" message. Was it a corrupt WU? Tullio |
Send message Joined: 27 Feb 08 Posts: 4 Credit: 960,510 RAC: 0 |
This http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12945697 WU crashed for me after just 1 min too. Wasn't doing anything special at the time. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Urglab & tullio, Both tasks terminated shortly after start-up. Stderr error report: INVALID THETA. That error indicates model instability and is not unusual with FAMOUS tasks but this is the first I've seen it with HadCM3n. I have nothing to suggest except that it looks like a bad batch of work -- and to hope the next batch is better. Thanks for reporting the problem. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Yep, same for me with hadcm3n_q7ar_1940_40_007280157_0 . Invalid Theta in stderr. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The programmers know about this. It's an over enthusiastic use of CO2 forcing, which has caused the models to turn into a "Venus world" in a few seconds. We're now waiting for the RAPIT/RAPID people to decide what values to use instead. No need to report more failures, thanks. :) Backups: Here |
Send message Joined: 14 Sep 10 Posts: 11 Credit: 1,812,972 RAC: 0 |
"Venus world"? You mean so much water vapor feedback that the oceans evaporate? What CO2 forcing are they using? |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I think that I snagged another of those bad CM3n WU’s. HadCm3n_8_1940_007280448 crashed after running only 1 min 2 sec. No telling how many other people downloaded these extreme CO2 forcing WU’s from the same batch and just haven’t started them yet. At least it didn't waste a lot of computer time before the crash. |
Send message Joined: 15 May 09 Posts: 4552 Credit: 19,039,635 RAC: 18,944 |
I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 Interestingly, both have been just after restarting the computer and restarting boinc. Yes I have suspended the model and shut boinc down before shutting down the box.Don't know if this is of any use to those who put the models together or not. Dave |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
[Dave wrote:] I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 ...The HADCM3N models in the queue are a mix of valid (unforced) models and invalid (forced) models. Just keep downloading them: if there are any valid ones left and you get one then let it run, otherwise let the invalid ones crash and don't attempt to rescue them. It's an odd way to sort the wheat from the chaff, but it works ... |
Send message Joined: 27 Jan 07 Posts: 301 Credit: 3,288,263 RAC: 26,370 |
[Dave wrote:] I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 ...The HADCM3N models in the queue are a mix of valid (unforced) models and invalid (forced) models. Just keep downloading them: if there are any valid ones left and you get one then let it run, otherwise let the invalid ones crash and don't attempt to rescue them. It's an odd way to sort the wheat from the chaff, but it works ... The downsides are large, wasteful downloads and a small amount of wasted CPU time. For people with limited connection speeds, this can be a pain. Isn't there a way to send kill packets to clients running these models with bad parameters? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Isn't there a way to send kill packets to clients running these models with bad parameters?No. And what you say doesn't make sense anyway, as the models crash so fast. For the 2 that I tried, (1 minute and 4 seconds), and (4 seconds repeated 5 times). Which are probably the same thing, as my 2 computers are running slightly different versions of BOINC, which report things slightly differently. There have been posts about this in the News thread, to which everyone should subscribe. As for large downloads, that's life. Long time crunchers can just keep up with the news, and manually stop downloading. Keeping in mind another post, where it was said that what models are available are being grabbed before they can fall from the end of the conveyor belt into the storage bin. 40,000 computers, a couple of thousand models, slowly being prepared. Not good for people wanting lots of work. :( Backups: Here |
©2025 cpdn.org