Message boards : Number crunching : FIX FOR BAD TIME REMAINING ESTIMATES?
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Is there some way to edit the time remaining estimate so that it more closely reflects reality. Right now I have a pair of hadcm3n WU�s running on my slower machine. The estimated time to completion at download is 1318 hours. I know experience that CM�s finish on that machine in approx. 825 hours. The reason that this is worth fixing is that it can interfere with downloading new work in a timely fashion. My work buffer is set at the maximum of 10 days (240 HOURS). With work thin on the ground this can be a problem. Until the �remaining� calculation reaches 240 hours Boinc wont even start requesting new tasks. If the estimate was accurate this would not be a problem, but, with it inflated by about 80% it means that I usually have only about 96 real hours of crunching by the time it says 240. It is frustrating to see that there are WU�s available and not be able to download them because I haven�t reached the magic 240. I know that this problem is supposed to be self-correcting as we complete WU�s, but, with hadcm3�s that takes about 60 days each it takes it can take it nearly a year to fix itself. This is made worse by the fact that this problem seems to recur every time I upgrade Boinc. Is there some edit that I can make in the clinent_state or some other file to fix this. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
|
Send message Joined: 6 Nov 08 Posts: 1 Credit: 195,828 RAC: 0 |
One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%.... |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,008,987 RAC: 21,524 |
One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%.... This sounds like one of the ways these models crash at or around the times the zip files are created. Look at the graphics and see if the model is stuck in a loop. You will probably need to abort it. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,008,987 RAC: 21,524 |
Perhaps info in this thread may help I am currently getting a page not found error on that link. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Perhaps info in this thread may help That was a page on the phpBB forum, it was closed down a few months ago unfortunately. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
One of the Work units computes. The elapsed time progresses but the remaining time does'nt move and the percentage done stays blocked at 50,104%.... I have the same or similar problem with one of my work units. It is stuck at 49.930%. But, graphics for CDPN work units have never worked on this machine. The statistics show average for CDPN is dropping, and total has flat-lined, while statistics for the other two BOINC projects show increases. I have "No new work units" selected for all projects, since I want to install a new version of my O/S soon. O/S is Ubuntu with KDE. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The model has probably failed. (At the 2nd 25% point.) Without graphics to watch the progress for a few minutes, the only thing that you can do is set all projects to No new tasks, (in the Projects tab), then Suspend each running work unit, (in the Tasks tab), Suspend BOINC, (in the menu), then Exit from BOINC. Now restart BOINC and unsuspend it, then unsuspend each of the tasks in the Tasks tab and let the climate model run for a while. (Could take a while, as you'll have to do what ever you did before to see if it looks like it's running or stopped.) If this process doesn't seem to work, then Abort the climate model. You can set projects for more work at some stage, but climate models are in very short supply at the moment. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,008,987 RAC: 21,524 |
But, graphics for CDPN work units have never worked on this machine. If you start BOINC from a terminal, when you hit the graphics button if it is the same problem I had, you will see which library is missing in the terminal output. Or at least I did with Ubuntu12.10 Since breaking things in the upgrade from13.04 to 13.10 I have had to do a clean install and not had any models to crunch. Graphics work in WCG but I am not sure if that means they will work in CPDN or not yet. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
If graphics on Linux not working see this thread But, graphics for CDPN work units have never worked on this machine. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Yup - what Les said. I just other day had a model that zombied out like that at 99.79% - after the last trickle but before the last big upload. Bummer. But if the clean shutdown-restart that Les described doesn't help, the only thing to do is kill it. (The other possibility is to restart from a clean backup, but that gets fiddly and tricky especially if you have other models running on the same machine, and only works sometimes) The model has probably failed. (At the 2nd 25% point.) |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
When I first started suspecting something was wrong, the estimated time time to completion would be around 214 hrs. One time it increased to 225 hrs. Now, when I suspend it and resume it, it always restarts with 212:01 elapsed with 214:37 estimated remaining. But, the progress doesn't change from 49.930% as it runs. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
In that case it looks like it's in a loop, and will remain so until the computer wears out. Or Hell freezes over. So just Abort it. |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
I did abort it. I suspect it could be a result of some sort of corruption on my computer: The same thing later happened to two World Community Grid tasks. (I don't remember which WCG project was involved with the first one. The current one is the Clean Energy Project.) |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
I did abort it. I suspect it could be a result of some sort of corruption on my computer: ... Usually it is when the task goes out of memory at the wrong moment (there are a few sensitive points in the model's processing where it cannot cope with being saved & reloaded). I'm a volunteer and my views are my own. News and Announcements and FAQ |
©2024 cpdn.org