climateprediction.net (CPDN) home page
Thread 'How much of a Compute error task is useful?'

Thread 'How much of a Compute error task is useful?'

Message boards : Number crunching : How much of a Compute error task is useful?
Message board moderation

To post messages, you must log in.

AuthorMessage
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 52509 - Posted: 9 Sep 2015, 13:10:17 UTC
Last modified: 9 Sep 2015, 13:10:53 UTC

Hi, I recently had a task crash after restarting... :(

I believe it was approx. 95% complete having run for 1,367,209 seconds...

Can anyone suggest how much of this Compute error'd task can be used by the project team as useful information?

The task is: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18678670

Thanks for any feedback?

Digby
ID: 52509 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,975,898
RAC: 14,500
Message 52510 - Posted: 9 Sep 2015, 15:46:00 UTC - in response to Message 52509.  

Essentially anything already reported by trickles is useful data - or so I am led to believe.
ID: 52510 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52512 - Posted: 9 Sep 2015, 23:21:40 UTC

I suspect that the answer depends.
To start with, it depends on what the researchers at a given research centre are trying to achieve; i.e. short segments of data, (which is what all of the models are these days.), or if they want to join up the bits to make a model run of, e.g. a hundred years.
To do this, they need ALL of the zips, because the last one, (usually zip 13), contains the data to start up the next segment.

If the data from a failed model is considered important, then they can re-issue that data set with a new name. (All of the data sets now come directly from the researchers.)

Then there's the other way of looking at it:
If you're looking for, say, peaches, and there are some that have started going bad, then you'll usually pick one that isn't.
And the researchers can do the same with the model data.

And some of the modelling doesn't use the trickle_up files to return small amounts of data; they're just there to let the server know that the model is still alive, and to create credits.

If there are lots of failures for a given batch, then the trickle_up files + the zips can give them a clue as to what went wrong, and where it happened.

ID: 52512 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 52537 - Posted: 11 Sep 2015, 13:47:01 UTC

Thanks for the feedback.

OK, so the gist is basically to do what you can to complete a task but if that fails then sometimes something might be salvaged from the trickles already received.

(FWIW I had another task error this morning when restarting the pc http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18759508.)

Running 24/7 will be more stable for task completion but it seems ironic that on a Desktop PC used during the day this will consume more energy and ultimately contribute more to climate change...

I would like to shut down at night and backup tasks as well.

So I am now taking the following steps to help complete my tasks:
- I just upgraded Boinc to 7.6.7 from 7.4.23 using https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/locutusofborg-ppa

- Every time I shutdown or reboot I have always suspended the CPDN project but from now on I will ALSO suspend each task individually.

- I have also unchecked 'leave non-GPU tasks memory while suspended'.

Lets see how it goes.

Cheers
ID: 52537 · Report as offensive     Reply Quote

Message boards : Number crunching : How much of a Compute error task is useful?

©2024 cpdn.org