Questions and Answers : Windows : \"Time remaining\" keeps counting up
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Feb 08 Posts: 9 Credit: 876,602 RAC: 0 |
I have two models running on the same pc. One behaves normal: \"elapsed time\" increases while \"time remaining\" decreases with about the same amount of time. In the other model \"time remaining\" is increasing with about the same amount of time as \"elapsed time\" does. Am I the victim of an endless loop or is there a solution? |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
This appears to be the model with the problem: hadsm3fub_00d0_005931581_8. The last trickle shows that the model used about ten times the normal amount of CPU to complete the trickle (145,400 vs 12,989 seconds). This does sometimes happen to slab models (i.e. hadsm3) and is usually accompanied by a blue temperature display. These ice worlds usually happen much later in the run - and don\'t recover. I would abort that model. |
Send message Joined: 24 Feb 08 Posts: 9 Credit: 876,602 RAC: 0 |
Hi Iain, You were right, it was that model with a blue temperature display, and I am going to abort it. Thanks for your comment. |
Send message Joined: 20 Jul 06 Posts: 4 Credit: 336,140 RAC: 0 |
I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0 I presume that I should abort it. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0 I have never got a restored slab model to get past the point at which it \"went blue\", so aborting it is the only option as far as I can see. Bad luck for you, as that model was well into phase 3. |
Send message Joined: 20 Jul 06 Posts: 4 Credit: 336,140 RAC: 0 |
I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0 Thanks Iain - I do regret having to do this. I\'ll check for a few days longer to see if the progress percentage is increasing |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Gabriel Your computer has a good record of model completion so you\'re obviously doing the right things to look after your models. This problematic slab looks as if it\'s been trying for 5 days to get to the next trickle. It was previously doing about 1.8 sec/TS. The current sec/TS that you see in the graphics window is a cumulative average, so I wouldn\'t be surprised if it\'s doing 18 sec/TS now. You can note down its timestep and the wall clock time, close the graphics window, then look back 10 minutes later to see where it\'s got and calculate its current speed. If it really is so slow and the graphics have gone, I\'d abort it now. As Iain says, we\'ve never seen one of these slow processing models recover. We\'ve seen the graphics of one of these slow \'iceworlds\' that did complete because its owner battled on to the end. From where the slowdown occurred the results were abnormal and unusable. It\'s just bad luck to get one of these. Cpdn news |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
It seems a bitter pill but we can take consolation in knowledge that these failures provide valuable information to the researchers. Given that the Project tests parameter combinations, knowing what doesn\'t work is useful in establishing boundary conditions. My dimming memory tells me that one of the researchers (Dave Frame?) posted several years ago that some failures can be more valuable than some successful Runs. That said, however, it\'s more satisfying to complete a Model; for me, the longer the Model, the greater the satisfaction. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Feb 05 Posts: 465 Credit: 1,914,189 RAC: 0 |
My dimming memory tells me that one of the researchers (Dave Frame?) posted several years ago that some failures can be more valuable than some successful Runs. Science, all about the trials, failures, errors, mistakes and successes. How would we even know success without some of the others. Having done less than you, but a significant amount myself, I still feel that sense of accomplishment whenever a task finishes. I am now dabbling in the Beta project, with 200 year models. Talk about a long crunching time. My C2D T7200 says it will take upwards 145 days to do nonstop. The 2.4G Xeon will take around 260 days. Lots of patience needed there, cause they have already crashed a bunch with bad parameters, etc. All in a cruncher\'s days work. |
Send message Joined: 20 Jul 06 Posts: 4 Credit: 336,140 RAC: 0 |
The progress percentage remains static. I\'ll abort the model. <sigh> Thank you all for the background and encouragement. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
You\'ll get credit for all the trickles the model sent. Better luck with the next model. By the way, if you only want a HADSM slab next time, you may have to wait until tomorrow as the HADSM work queue is currently empty. Cpdn news |
Send message Joined: 14 Jan 07 Posts: 52 Credit: 284,001 RAC: 0 |
Recently had to abort this ice ball.Poor thing had been going less than 48hours. It was (I thought) happily crunching away running at 1.30 t/s and reporting every 4 hours but after 10 trickles it turned blue and 7 hours later had only completed 6000 of the 10.800 timesteps needed for the next trickle the T/S was up to 1.39 by then.Suspended at 144 on the countdown [to reboot Boinc,worth a try!]it fell back to the previous checkpoint knocking off .02% off the progress meter and the T/S fell back to 1.33.So a rapid time increase in the last countdown. Crashed a slab over at beta a while ago also suspending at144 countdown (to take a backup) immediately after restart it crashed ,as expected, did the backup. So now always treat the start of any countdown 1 below eg 143 for a slab and no problems with work loss or crashes since. Had 2 out of 11 Ice ball, the first wu was completed successfully by another user running an AMD machine and the last is being run by another intel user so interested in how they get on. |
Send message Joined: 4 Oct 07 Posts: 1 Credit: 14,817 RAC: 0 |
Aargh, and I wonder why nothing happens. The Globe is frozen at 68,240%.. Strangely, the percentage increases sometimes, but then resets to 68,240%. And the time display shows 2051 - later than the planned end point. I will abort this model then. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
What does the temperature display show? (\'T\' when the globe is displayed), and the model speed? (\'Z\' when graphics are showing to remove grey sidebar - the speed figure is marked \'s/ts\'). http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6914668 Note that Phase three *starts* at 2050 and continues to 2065 (see the \'running your model\' readme, \'information\' section, first two posts describing types of model - link in my signature). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Chieron, It\'s a genuine ice world, as this unfortunate run of trickles shows for another Intel/Windows host in the same work unit: here. Intel/Linux and AMD/Windows results have completed in that work unit. Abort it. Iain [Oops, sorry Mike.] |
©2025 cpdn.org