climateprediction.net (CPDN) home page
Thread 'Not complete - running but not using CPU'

Thread 'Not complete - running but not using CPU'

Message boards : Number crunching : Not complete - running but not using CPU
Message board moderation

To post messages, you must log in.

AuthorMessage
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 46694 - Posted: 24 Jul 2013, 12:07:19 UTC

Had 3 models last 2 weeks that got near end and then stopped running -showed running - BOINC manager showed as running but task manager showed not using any CPU. Graphics display stayed stuck at same point for a day or two (real days -- model days never progressed.) After a day or two killed all 3. Two of them at 99% plus one in the 80% range.
Seemed weird that tasks would show running in BOINC but graphics screen showed same point for days and system showed no CPU usage.
Killed the jobs and going on - but strange strange strange.
ID: 46694 · Report as offensive     Reply Quote
ProfileGreg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 46699 - Posted: 24 Jul 2013, 21:51:56 UTC - in response to Message 46694.  

I've had that too, with one model.

A system reboot fixed it, allowed it to carry on and finish successfully. (I needed to apply some updates anyway...)
ID: 46699 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 46702 - Posted: 24 Jul 2013, 23:54:09 UTC - in response to Message 46699.  

Unfortunately the ones I had were not helped by reboot.
ID: 46702 · Report as offensive     Reply Quote
liz

Send message
Joined: 13 May 12
Posts: 2
Credit: 191,869
RAC: 0
Message 46720 - Posted: 27 Jul 2013, 22:03:14 UTC - in response to Message 46702.  

I've got one I think is doing the same - stuck at the same 25.193% for about 10 days. After each reboot I lose "elapsed time"; last night it had done 156 hours and this morning it had done 152 hours. After reading this thread I see it doesn't show up in top, so I've killed the job.
ID: 46720 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46721 - Posted: 27 Jul 2013, 22:13:06 UTC - in response to Message 46720.  

Ah, the dreaded "Failure at zip time" problem.

ID: 46721 · Report as offensive     Reply Quote
liz

Send message
Joined: 13 May 12
Posts: 2
Credit: 191,869
RAC: 0
Message 46722 - Posted: 28 Jul 2013, 6:06:36 UTC - in response to Message 46721.  

Sorry Les, I'm not familiar with that. Could you send me a link to an explanation?
ID: 46722 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46723 - Posted: 28 Jul 2013, 8:32:23 UTC - in response to Message 46722.  

There are posts all through this board about how touchy this type of model is to being interrupted, especially at all of the 25% points.
And then there's a few that complete, and then just run in a short loop just past the finish point.

There's a short, fairly recent, thread here about the former, and one about the later here.


ID: 46723 · Report as offensive     Reply Quote

Message boards : Number crunching : Not complete - running but not using CPU

©2024 cpdn.org