Message boards : Number crunching : after 333 hours, computation error! why? any way to fix it?
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Jul 05 Posts: 9 Credit: 163,228 RAC: 0 |
Hi, I don't want to lose these 333 hours of work. This is the output: 12/16/2013 1:29:53 PM | climateprediction.net | Sending scheduler request: To send trickle-up message. 12/16/2013 1:29:53 PM | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager 12/16/2013 1:29:58 PM | climateprediction.net | Scheduler request completed 12/16/2013 1:30:05 PM | climateprediction.net | Computation for task hadcm3n_ob2t_1900_40_008469480_0 finished 12/16/2013 1:30:05 PM | climateprediction.net | Output file hadcm3n_ob2t_1900_40_008469480_0_3.zip for task hadcm3n_ob2t_1900_40_008469480_0 absent 12/16/2013 1:30:05 PM | climateprediction.net | Output file hadcm3n_ob2t_1900_40_008469480_0_4.zip for task hadcm3n_ob2t_1900_40_008469480_0 absent I will suspend the project, to keep the files. Thanks |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Only possible way is with a complete backup of CPDN files and folders. Sorry. [Edit] You received credit for all Trickles returned, so not much lost there. Even failed tasks can contain valuable information for the scientists -- largely defining the envelope of valid parameter sets. [Edit2] The task in question isn't 'Reported' yet, so the error code isn't known. However, it shows 20 Trickles received -- suggesting it was interrupted at the 50% mark, while generating the Restart Dump. Any interruption while that is being done is fatal. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And there's no point in keeping the files; the model has failed. Lots of info on the zip creation problem in Number Crunching if you want more. Backups: Here |
Send message Joined: 21 Jul 05 Posts: 9 Credit: 163,228 RAC: 0 |
Thanks! So how do I know if I can interrupt it or not? let's say I want to shutdown my laptop.. is there any (easy to read) message so I can wait a few secs before actually shutting it down? Regards, |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Just use the Show Graphics button to look at the model. In the bottom left corner is some info about the current state of the processing, including how many more time steps to the next check point. After the number reaches zero it will go to a high number, (either 72 or 360, for the current types of model), at which point it starts saving all of the open files. Let the number run down a bit to give it time, and then Suspend the model. (Perhaps 65 or 350, till you work out something for yourself. The closer it is to zero, the more of the model will have to be re-run when it's re-started.) However, the usual 'catch' applies: if you've got other models waiting to run they'll start, so any models waiting to start should be Suspended first. There's another thing to watch for: When the date gets close to the start of December, it's not far from a long pause while it works on converting lots of small files from one data type to another, and then adding them to a zip file for uploading to the server. While it's doing this, there is a message below the others to say that it's doing this. From the start of December until the zip has been created, plus waiting until after the next check point, (when a new set of files will exist on the disk), is the time to NOT interrupt it. Backups: Here |
©2024 cpdn.org