climateprediction.net (CPDN) home page
Thread 'Replication and error counts'

Thread 'Replication and error counts'

Message boards : Number crunching : Replication and error counts
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user69295

Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 40696 - Posted: 17 Sep 2010, 16:02:08 UTC
Last modified: 17 Sep 2010, 16:03:14 UTC

No good deed goes unpunished.

I was trying to give other folks a shot at some of the new regional models (since I already had as many as I thought I could handle), so I aborted some I had not started yet. Since then, I've adjusted my queue depth to something more reasonable.

I was expecting that the WU's would be reassigned to other crunchers. Unfortunately, some of these WU's already had two other errors of various types from other machines. They are now unavailable to anyone because of "too many error results".

I would like to suggest that "aborted by user" and "detached from project" not be counted against the WU as errors. These are not issues with either BOINC or the science, and CPDN is being hurt by it.

=Mike
ID: 40696 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40699 - Posted: 17 Sep 2010, 20:03:45 UTC - in response to Message 40696.  

It's a problem with the way the BOINC server code works.
This is a well know problem, and has been discussed behind the scenes for a year or so.

Basically, the server issues ALL the models that are specified by the values for the 3 lines before the list right at the start, and then waits for results.
Data sets are only re-issued if the failures don't exceed the limits, and aborting models will make it exceed these limits.

Once people have a model that's more than they need, the best solution is to Suspend the excess for latter processing.


Backups: Here
ID: 40699 · Report as offensive     Reply Quote
old_user69295

Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 40701 - Posted: 17 Sep 2010, 20:58:52 UTC - in response to Message 40699.  

If we asked Milo nicely, do you think he would reset WU's 6926193 and 6883105?
ID: 40701 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40702 - Posted: 17 Sep 2010, 22:10:06 UTC - in response to Message 40701.  

Tolu has left the project, and Milo is on a long awaited and several times postponed holiday.
So I'm thinking that the answer is: No.


Backups: Here
ID: 40702 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40708 - Posted: 18 Sep 2010, 3:16:25 UTC

I agree with Les. It is nearly always better to suspend extra climate models and process them later rather than aborting them.

If you have too many models and cannot complete some before their deadline, do not worry. The CPDN servers accept results uploaded after model deadlines and late results will be used by the researchers. (This is only true for the CPDN servers, not for other projects.)

Don't give any importance to the Boinc message 'Too many error results' on workunit pages. It doesn't apply to CPDN and is there because other projects need it.

Cpdn news
ID: 40708 · Report as offensive     Reply Quote

Message boards : Number crunching : Replication and error counts

©2024 cpdn.org