climateprediction.net (CPDN) home page
Thread 'Workunit error - check skipped'

Thread 'Workunit error - check skipped'

Message boards : Number crunching : Workunit error - check skipped
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileSkip Da Shu
Avatar

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 15,308,708
RAC: 298
Message 35075 - Posted: 21 Sep 2008, 17:04:56 UTC

I am noticing that most if not all of my WUs returned that are marked as \"success\" have \'valid state\' as \"Workunit error - check skipped\". Because some of these are OC\'d dedicated crunchers I was about to undertake an across the board reduction in clock speeds until I checked a couple \'not overclocked\' machines http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=8270 and see it getting the same result. Am I really contributing anything at all these days? Is this \'normal\'?
- da shu @ HeliOS,
"Free software is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer"
ID: 35075 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 35076 - Posted: 21 Sep 2008, 17:55:33 UTC

I see that message and my boxes are not overclocked.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 35076 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,699,166
RAC: 9,972
Message 35077 - Posted: 21 Sep 2008, 20:06:13 UTC

The \"Workunit error - check skipped\" message is a server (mis)configuration error, and does not reflect badly on your computer or the work you\'re offering to CPDN. This particular part of BOINC is ignored by CPDN, and all results are gratefully received.

If you click through to the page for the whole workunit, like WU 6160816, you\'ll see the problem at the top, above the list of tasks:

initial replication 10
max # of error/total/success tasks 2, 1, 1
errors Too many error results Too many total results

Which, if you think about it, makes no sense - why send out ten copies of a job, if you\'re only prepared to accept one of them back (but allow both of it to be in error)?

The answer lies in the way which CPDN has been shoehorned into a BOINC framework which, frankly, doesn\'t fit it very well. Most BOINC projects send out much shorter tasks, and double-check the validity of the results returned by insisting that two (or more) replies from independent computers match to some degree of acceptability. If they don\'t, then an extra result (or results) are sent out until agreement is reached, or the limits are reached.

Because of the length of time the tasks take, CPDN can\'t operate like that. We can\'t wait for one task to finish before sending out a comparison, and the high chance of failure makes it unfeasible to check results by directly comparing the results from independent computers.

Instead, CPDN sends out all of its \'replication\' tasks when the workunit is first created, and is grateful to accept any and all that come back. But the vestiges of the quorum checking system are still in place: and if the numbers haven\'t been configured properly, then the automatic (but irrelevant) messages that you\'ve spotted are the result.
ID: 35077 · Report as offensive     Reply Quote

Message boards : Number crunching : Workunit error - check skipped

©2024 cpdn.org