climateprediction.net (CPDN) home page
Thread 'Aborting Tasks'

Thread 'Aborting Tasks'

Message boards : Number crunching : Aborting Tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 58036 - Posted: 31 Mar 2018, 13:26:43 UTC

Does aborting a task make it immediately available for re-issue - or thereabouts?
ID: 58036 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 58037 - Posted: 31 Mar 2018, 15:38:22 UTC - in response to Message 58036.  

Pretty much. As long as work unit hasn't reached its "Max # of errors" total, which is usually 3 for the work units being sent out nowadays.
ID: 58037 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 58040 - Posted: 31 Mar 2018, 18:29:46 UTC - in response to Message 58037.  

Thanks, I thought that was probably the case.
ID: 58040 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,975,898
RAC: 14,500
Message 58473 - Posted: 24 Jul 2018, 22:12:58 UTC

Following the recent outage among the work units I have got are some from earlier batches - 660, 709 and 719. All are on their third go. Should I allow these to run or should I abort?
ID: 58473 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58474 - Posted: 24 Jul 2018, 23:13:02 UTC

Nothing in the 700 range has been closed yet, so those 2 are OK.
I don't see 660 in the closed list either, but it is getting a bit old. Best to get it run soonish.

Yesterday I completed a 647 that one of my computers picked up 12 hours before the nam50's came out, so good to get that out of the way.

Which leaves a 719 still running with 2 days to go. Just under 8 days run time so far to reach 85%, if that helps.
ID: 58474 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,975,898
RAC: 14,500
Message 58475 - Posted: 25 Jul 2018, 8:15:18 UTC - in response to Message 58474.  

Thanks Les. The 660 is about 25% complete after just under 2 days so shouldn't take much longer. The 709 and 719 haven't started yet. The 735s that I have had have been taking between 3 nad 4 days to complete. I'll keep plodding.
ID: 58475 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 58654 - Posted: 28 Aug 2018, 13:10:59 UTC

Hi folks,
I got a hadcm3s_a18n_203412_120_599 on my Linux box. It is in its 3rd attempt and I wonder should I let it finished or this batch is of no scientific interest any more?
ID: 58654 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,002,360
RAC: 21,497
Message 58656 - Posted: 28 Aug 2018, 17:28:20 UTC - in response to Message 58654.  

I got a hadcm3s_a18n_203412_120_599 on my Linux box. It is in its 3rd attempt and I wonder should I let it finished or this batch is of no scientific interest any more?


It is still in the current list, not in batches to be closed or closed lists. But as you note it is quite old.
ID: 58656 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58657 - Posted: 28 Aug 2018, 21:15:20 UTC

But it's a "short", which are usually a challenge to complete, so you could have a go at it just to see if your computer can do it. :)
ID: 58657 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 58675 - Posted: 1 Sep 2018, 19:32:31 UTC - in response to Message 58657.  

I ruined it. I did system update before the WU finished and after restart it ended with computation error. Here is the error I updated from 14.04 to 16.04 LTS and after the crash I checked missing libraries and I had to install gcc.4.7-multilib. Not sure if this was the reason, but I really had to let if finish first - it was its 3rd attempt
ID: 58675 · Report as offensive     Reply Quote

Message boards : Number crunching : Aborting Tasks

©2024 cpdn.org