climateprediction.net (CPDN) home page
Thread 'FIX FOR BAD TIME REMAINING ESTIMATES?'

Thread 'FIX FOR BAD TIME REMAINING ESTIMATES?'

Message boards : Number crunching : FIX FOR BAD TIME REMAINING ESTIMATES?
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 46234 - Posted: 16 May 2013, 1:22:57 UTC

Is there some way to edit the time remaining estimate so that it more closely reflects reality. Right now I have a pair of hadcm3n WU�s running on my slower machine. The estimated time to completion at download is 1318 hours. I know experience that CM�s finish on that machine in approx. 825 hours.

The reason that this is worth fixing is that it can interfere with downloading new work in a timely fashion. My work buffer is set at the maximum of 10 days (240 HOURS). With work thin on the ground this can be a problem. Until the �remaining� calculation reaches 240 hours Boinc wont even start requesting new tasks. If the estimate was accurate this would not be a problem, but, with it inflated by about 80% it means that I usually have only about 96 real hours of crunching by the time it says 240. It is frustrating to see that there are WU�s available and not be able to download them because I haven�t reached the magic 240.

I know that this problem is supposed to be self-correcting as we complete WU�s, but, with hadcm3�s that takes about 60 days each it takes it can take it nearly a year to fix itself. This is made worse by the fact that this problem seems to recur every time I upgrade Boinc. Is there some edit that I can make in the clinent_state or some other file to fix this.


ID: 46234 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 46242 - Posted: 16 May 2013, 18:33:35 UTC

ID: 46242 · Report as offensive     Reply Quote
old_user544258

Send message
Joined: 6 Nov 08
Posts: 1
Credit: 195,828
RAC: 0
Message 47251 - Posted: 8 Oct 2013, 4:07:56 UTC

One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%....
ID: 47251 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 47252 - Posted: 8 Oct 2013, 6:30:28 UTC - in response to Message 47251.  

One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%....


This sounds like one of the ways these models crash at or around the times the zip files are created. Look at the graphics and see if the model is stuck in a loop. You will probably need to abort it.
ID: 47252 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 47253 - Posted: 8 Oct 2013, 6:32:04 UTC

Perhaps info in this thread may help


I am currently getting a page not found error on that link.
ID: 47253 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47254 - Posted: 8 Oct 2013, 10:42:10 UTC - in response to Message 47253.  

Perhaps info in this thread may help


I am currently getting a page not found error on that link.


That was a page on the phpBB forum, it was closed down a few months ago unfortunately.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47254 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 47418 - Posted: 28 Oct 2013, 15:53:11 UTC - in response to Message 47252.  

One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%....


This sounds like one of the ways these models crash at or around the times the zip files are created. Look at the graphics and see if the model is stuck in a loop. You will probably need to abort it.

I have the same or similar problem with one of my work units. It is stuck at 49.930%. But, graphics for CDPN work units have never worked on this machine.

The statistics show average for CDPN is dropping, and total has flat-lined, while statistics for the other two BOINC projects show increases.

I have "No new work units" selected for all projects, since I want to install a new version of my O/S soon. O/S is Ubuntu with KDE.

ID: 47418 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47420 - Posted: 28 Oct 2013, 19:20:30 UTC - in response to Message 47418.  

The model has probably failed. (At the 2nd 25% point.)

Without graphics to watch the progress for a few minutes, the only thing that you can do is set all projects to No new tasks, (in the Projects tab), then Suspend each running work unit, (in the Tasks tab), Suspend BOINC, (in the menu), then Exit from BOINC.

Now restart BOINC and unsuspend it, then unsuspend each of the tasks in the Tasks tab and let the climate model run for a while. (Could take a while, as you'll have to do what ever you did before to see if it looks like it's running or stopped.)
If this process doesn't seem to work, then Abort the climate model.

You can set projects for more work at some stage, but climate models are in very short supply at the moment.

ID: 47420 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 47430 - Posted: 29 Oct 2013, 6:42:01 UTC - in response to Message 47418.  

But, graphics for CDPN work units have never worked on this machine.



If you start BOINC from a terminal, when you hit the graphics button if it is the same problem I had, you will see which library is missing in the terminal output. Or at least I did with Ubuntu12.10 Since breaking things in the upgrade from13.04 to 13.10 I have had to do a clean install and not had any models to crunch. Graphics work in WCG but I am not sure if that means they will work in CPDN or not yet.
ID: 47430 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 47433 - Posted: 29 Oct 2013, 8:58:46 UTC - in response to Message 47430.  

If graphics on Linux not working see this thread

But, graphics for CDPN work units have never worked on this machine.



If you start BOINC from a terminal, when you hit the graphics button if it is the same problem I had, you will see which library is missing in the terminal output. Or at least I did with Ubuntu12.10 Since breaking things in the upgrade from13.04 to 13.10 I have had to do a clean install and not had any models to crunch. Graphics work in WCG but I am not sure if that means they will work in CPDN or not yet.


ID: 47433 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 47434 - Posted: 29 Oct 2013, 9:22:12 UTC - in response to Message 47420.  

Yup - what Les said. I just other day had a model that zombied out like that at 99.79% - after the last trickle but before the last big upload. Bummer. But if the clean shutdown-restart that Les described doesn't help, the only thing to do is kill it. (The other possibility is to restart from a clean backup, but that gets fiddly and tricky especially if you have other models running on the same machine, and only works sometimes)

The model has probably failed. (At the 2nd 25% point.)

Without graphics to watch the progress for a few minutes, the only thing that you can do is set all projects to No new tasks, (in the Projects tab), then Suspend each running work unit, (in the Tasks tab), Suspend BOINC, (in the menu), then Exit from BOINC.

Now restart BOINC and unsuspend it, then unsuspend each of the tasks in the Tasks tab and let the climate model run for a while. (Could take a while, as you'll have to do what ever you did before to see if it looks like it's running or stopped.)
If this process doesn't seem to work, then Abort the climate model.

You can set projects for more work at some stage, but climate models are in very short supply at the moment.



ID: 47434 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 47445 - Posted: 30 Oct 2013, 18:12:10 UTC - in response to Message 47420.  

When I first started suspecting something was wrong, the estimated time time to completion would be around 214 hrs. One time it increased to 225 hrs.

Now, when I suspend it and resume it, it always restarts with 212:01 elapsed with 214:37 estimated remaining. But, the progress doesn't change from 49.930% as it runs.

ID: 47445 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47446 - Posted: 30 Oct 2013, 19:03:01 UTC - in response to Message 47445.  

In that case it looks like it's in a loop, and will remain so until the computer wears out. Or Hell freezes over.
So just Abort it.


ID: 47446 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 47639 - Posted: 23 Nov 2013, 21:54:43 UTC - in response to Message 47446.  

I did abort it. I suspect it could be a result of some sort of corruption on my computer: The same thing later happened to two World Community Grid tasks.

(I don't remember which WCG project was involved with the first one. The current one is the Clean Energy Project.)
ID: 47639 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47642 - Posted: 25 Nov 2013, 12:20:41 UTC - in response to Message 47639.  

I did abort it. I suspect it could be a result of some sort of corruption on my computer: ...


Usually it is when the task goes out of memory at the wrong moment (there are a few sensitive points in the model's processing where it cannot cope with being saved & reloaded).


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47642 · Report as offensive     Reply Quote

Message boards : Number crunching : FIX FOR BAD TIME REMAINING ESTIMATES?

©2024 cpdn.org