climateprediction.net (CPDN) home page
Thread 'New small batches of long runs --> with problems'

Thread 'New small batches of long runs --> with problems'

Message boards : Number crunching : New small batches of long runs --> with problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,366,450
RAC: 15,463
Message 56192 - Posted: 10 May 2017, 22:25:25 UTC - in response to Message 56004.  

Aaargh!!! 43days into task and 82% complete and MS10 does an update and restart:-((((. Lost with a computation error.
ID: 56192 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56193 - Posted: 11 May 2017, 5:11:10 UTC - in response to Message 56192.  

I run Linux on all my machines and that is all I have ran this century. A thought occurred to me. Is it possible to use your router functions to block MS domains to prevent aotomagical updates?
ID: 56193 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56194 - Posted: 11 May 2017, 5:25:14 UTC - in response to Message 56192.  

Aaargh!!! 43days into task and 82% complete and MS10 does an update and restart:-((((. Lost with a computation error.


It might not be a bad practice for anyone still running these very long tasks to return to making frequent backups. Two months is a long time to go without a crash or unexpected restart.
ID: 56194 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 56195 - Posted: 11 May 2017, 5:57:22 UTC - in response to Message 56192.  

Hi Alan,

I see both your PCs are running Win10 Pro, so you can stop auto updates using Group Policy (can't do this for the Home version). See this link on tenforums.

I use this and never have a problem, as the updates will NOT be downloaded until I say it is OK. Although I configured the Group Policy to notify me before download AND before install, I find it installs immediately after download. That I can cope with.

However I think I remember reading that with the new Creators version, it limits the time of delay. Can't find the link, but I seem to remember it was a reasonable time. Creators does make it easy to defer updates, but I'm not sure if it will notify you at the end of the deferral period that is is going to install updates. See PCWorld here. It is still important to be notified of updates when running CPDN. I guess I'll find out more when the Creators edition comes through to my PCs.

Jim,

As far as I'm aware, I don't think there is a decent restore procedure in place. In the old days even restoring a couple of tasks was a nightmare, but with today's multiprocessor PCs is it really worth the headache? OK, did you backup 1 hour ago, 1 day ago, one week ago? The backup will restore the task OK, but then start sending the same trickles again. Does the backend handle this OK (I don't know.) As I've said before, as long as the PC is normally reliable don't worry about a few failures. They will get reissued, and surely the researchers & programmers are well aware of the average failure rate and will have all this built into how they handle the results. If not they should be saving up for a new mainframe ;-)

For the record I backup everything every day except for the BOINC task folders.
ID: 56195 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56196 - Posted: 11 May 2017, 6:20:39 UTC - in response to Message 56195.  
Last modified: 11 May 2017, 6:21:35 UTC

There was a time when I used to make a backup every morning. The “backend” handled any duplicate trickles just fine. You didn’t get any credits for the duplicates, but, when you reached the point where you left off it picked up without a hitch. I would rather repeat the crunching for all the cores for a day or two than loose something that I had been running for 43 day!
ID: 56196 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56197 - Posted: 11 May 2017, 6:26:43 UTC - in response to Message 56196.  

For me the question is how many of these very long tasks are there going to be? If they are going to be regular then I will start making backups again especially as none of my machines have more than 4 cores, two of them being only dual core. - If the long tasks are only going to be very occasional ones then perhaps backing up only when I have one of them might be the way to go?
ID: 56197 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,366,450
RAC: 15,463
Message 56283 - Posted: 21 May 2017, 18:12:33 UTC - in response to Message 56192.  

Just checked and the repeat issues of this run have also failed with a compute error after a much shorter time than mine did. Oh well - might not have been MSs fault after all:-)
ID: 56283 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : New small batches of long runs --> with problems

©2024 cpdn.org