Message boards : Number crunching : HADCM3PN DEAD???
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I think that I have a problem. Hadcm3n_yfok_1980_40_00784442_0 reached 100% a few hours hours ago, but, there is no sign of the final zip file. The boinc manager says that the WU is still �running� instead of uploading.Elapsed time indicator still going up. According to the graphics the model is stuck at 99.97%. Model is stuck at Timestep 1038232. The messages are a bit confusing due to all the backed up zip file from the Hadam3p_eu that can�t upload due to server problem. Messages below: 4/29/2012 11:44:50 AM | | Resuming network activity 4/29/2012 11:44:50 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_4.zip 4/29/2012 11:44:50 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_5.zip 4/29/2012 11:44:50 AM | climateprediction.net | Sending scheduler request: To send trickle-up message. 4/29/2012 11:44:50 AM | climateprediction.net | Requesting new tasks for CPU 4/29/2012 11:44:58 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_4.zip: transient HTTP error 4/29/2012 11:44:58 AM | climateprediction.net | Backing off 4 hr 9 min 36 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_4.zip 4/29/2012 11:44:58 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_5.zip: transient HTTP error 4/29/2012 11:44:58 AM | climateprediction.net | Backing off 5 hr 34 min 14 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_5.zip 4/29/2012 11:44:58 AM | climateprediction.net | Started upload of hadam3p_pnw_c6g3_1970_1_007941314_0_2.zip 4/29/2012 11:44:58 AM | climateprediction.net | Started upload of hadam3p_pnw_c6g3_1970_1_007941314_0_3.zip 4/29/2012 11:45:01 AM | climateprediction.net | Scheduler request completed: got 1 new tasks 4/29/2012 11:45:03 AM | climateprediction.net | Started download of hadam3p_pnw_cbzw_1963_1_007948507.zip 4/29/2012 11:45:05 AM | climateprediction.net | Finished download of hadam3p_pnw_cbzw_1963_1_007948507.zip 4/29/2012 11:46:00 AM | climateprediction.net | Finished upload of hadam3p_pnw_c6g3_1970_1_007941314_0_2.zip 4/29/2012 11:46:00 AM | climateprediction.net | Finished upload of hadam3p_pnw_c6g3_1970_1_007941314_0_3.zip 4/29/2012 11:46:01 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_6.zip 4/29/2012 11:46:01 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_7.zip 4/29/2012 11:46:06 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_6.zip: transient HTTP error 4/29/2012 11:46:06 AM | climateprediction.net | Backing off 2 hr 34 min 9 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_6.zip 4/29/2012 11:46:06 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_7.zip: transient HTTP error 4/29/2012 11:46:06 AM | climateprediction.net | Backing off 30 min 7 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_7.zip 4/29/2012 11:46:08 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip 4/29/2012 11:46:08 AM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip 4/29/2012 11:46:09 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip: transient HTTP error 4/29/2012 11:46:09 AM | climateprediction.net | Backing off 9 min 14 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip 4/29/2012 11:46:09 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip: transient HTTP error 4/29/2012 11:46:09 AM | climateprediction.net | Backing off 14 min 44 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip 4/29/2012 11:46:31 AM | | Suspending network activity - user request 4/29/2012 11:48:06 AM | | Resuming network activity 4/29/2012 11:48:19 AM | climateprediction.net | update requested by user 4/29/2012 11:48:22 AM | climateprediction.net | Sending scheduler request: Requested by user. 4/29/2012 11:48:22 AM | climateprediction.net | Not reporting or requesting tasks 4/29/2012 11:48:24 AM | climateprediction.net | Scheduler request completed 4/29/2012 12:05:33 PM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip 4/29/2012 12:05:33 PM | climateprediction.net | Started upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip 4/29/2012 12:05:35 PM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip: transient HTTP error 4/29/2012 12:05:35 PM | climateprediction.net | Backing off 21 min 16 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_8.zip 4/29/2012 12:05:35 PM | climateprediction.net | Temporarily failed upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip: transient HTTP error 4/29/2012 12:05:35 PM | climateprediction.net | Backing off 18 min 18 sec on upload of hadam3p_eu_8fap_2001_1_007868816_0_9.zip IS THE MODEL DEAD. Should I try again? I have a back up from 2 day ago. UPDATE: WU crashed. I am now running restored with 40 hours left. |
Send message Joined: 5 Aug 04 Posts: 6 Credit: 184,430 RAC: 0 |
As you may have noticed some upload servers are out of service. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
According to the posts and my experience the hadamcn3 models are unaffected by uploader1.atm being out of action as they use a different server. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Bad new to report. The restored WU progressed to the exect same spot at 99.97% and hung up again. It has been aborted. There is one thing that I was wondering. I recently upgraded to the new version [7.0.25 (x64)] of Boinc from the 6.10.58. Could upgrading while the hadcm3n was running have caused this? Has anyone else finished a CM model with the new Boinc manager? I still have the WU backup and copy of the 6.10.58 manager stored on my computer if you think it might help. The CM�s are such a big commitment of time (about 60 days) that I hate to just give up on this one. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There is a condition, of unknown cause, whereby the Coupled Ocean models will get to a point where the data is usually gathered up, zipped, and sent back to the server, and then the model just stops. I've had one or two that didn't even produce the zip. The model doesn't even self-abort, it just sits there doing nothing. It's been discussed at the project level, and something may come of it in time. However, lots of models do complete successfully, although I don't know how many, what percentage, etc. Backups: Here |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
However, lots of models do complete successfully, although I don't know how many, what percentage, etc. From four PCs, running XP or Linux with BM 6.n.n. Results: 85 CM3n started, 66 completed, 19 failed at the 25/50/75/100% points. I.e. just over 75% CM3n complete successfully, and just under 25% fail at the zip points. . |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
There is one thing that I was wondering. I recently upgraded to the new version [7.0.25 (x64)] of Boinc from the 6.10.58. Could upgrading while the hadcm3n was running have caused this? Has anyone else finished a CM model with the new Boinc manager? Jim, e.g. this CM task 14363087 completed on BM 7.0.25. Note that BM 7 makes changes to client_state.xml, as per the release notes, and the the V7 to V6 downgrade incompatibility. On balance, I'd be more suspicious of the empirically high fail rate of these CM3n models at the zip points. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Thanks everyone. I guess that the WU is just dead. It is hard to give up on it when you have 700+ hours of crunching invested. |
©2024 cpdn.org