Message boards : Number crunching : Bug in in Hadcm3n
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,007,915 RAC: 3,381 |
Hello! This task stopped at 529992 and is showing no signs of further progress. At this moment task is suspended, but not aborted yet. Any recommendations? P.S. Such situation (or bug) is the worst scenario for remote hosts... Thank You! |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
A windows/Intel computer in that work unit got farther than the one on your computer, so there doesn't appear to be a bug in the model. The model on your computer has stopped at trickle #20, which may suggest a problem during preparation of the 10-year Zip file upload. Was the model doing anything at all before you suspended it? Or had the percentage progress reverted to zero or something similar? |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,007,915 RAC: 3,381 |
A windows/Intel computer in that work unit got farther than the one on your computer, so there doesn't appear to be a bug in the model. Are both models identical? The model on your computer has stopped at trickle #20, which may suggest a problem during preparation of the 10-year Zip file upload. Trickle #20 is at time step 518,400, the model stopped later - at time step 529,992 and 72 steps before next checkpoint. Was the model doing anything at all before you suspended it? Or had the percentage progress reverted to zero or something similar? The model was running normal, the progress was normal - at least, I saw no anomalies before. I suspended it after I found - it is stopped. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
Are both models identical?The model specification is identical, but the model may develop differently on different hardware. If both platforms are Windows/Intel then the model development will usually be identical. Trickle #20 is at time step 518,400, the model stopped later - at time step 529,992 and 72 steps before next checkpoint.At ~3.7 s/timestep that's ~12 hours after the decade trickle. The model should be well clear of anything Zip-related. The model was running normal, the progress was normal - at least, I saw no anomalies before. I suspended it after I found - it is stopped.What happens when it is restarted? |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,007,915 RAC: 3,381 |
What happens when it is restarted? Nothing positive, and I tried all variations of restart - from "suspend / resume" to "turn off / turn on computer". :-) Unfortunately! I found second stopped model - hadcm3n_yms2_1940_40_007432202_1. On another host. Last trickle sent - at time step 259200, model stopped at time step 259488 and again 72 steps before next checkpoint. Screen saver shows constant 130+ hours elapsed, BOINC manager - 176+ hours allready. Details. 1. Both hosts are using BOINC 6.12.33. For this version of BOINC on screen saver start I see sometimes messages, similar to "BOINC screen saver diagnostics error". 2. BOINC manager is "pulling the wool over my eyes" :-) - elapsed time is going up, remaining time is going down. So, in manager it looks like - the process is going normal and, maybe, this is a normal behavior of BM, because contact with the task may be lost. What can be wrong finally? IMHO, it does not look like - only BOINC manager's progress bar is frozen. It looks like - models are realy dead, because hadcm3n_yms2_1940_40_007432202_1 sent it's last trickle at 8th September - 1 week ago. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
This looks like something new: I'll pass it on and see if anyone else knows what's going on. The "BOINC screen saver diagnostics error" is fixed in 6.12.34. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
If you open the model's graphics and look at the timesteps and countdown, do you see the model repeating the same timesteps again and again, ie a sort of looping behaviour? Cpdn news |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,007,915 RAC: 3,381 |
If you open the model's graphics and look at the timesteps and countdown, do you see the model repeating the same timesteps again and again, ie a sort of looping behaviour? All of scenes or pictures are normal, but they are absolutely STATIC - no signs of life at all. |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,007,915 RAC: 3,381 |
The "BOINC screen saver diagnostics error" is fixed in 6.12.34. Thank You for info! |
©2024 cpdn.org