climateprediction.net (CPDN) home page
Thread 'RESOLVED - Too Many Total Results / Too Many Errors (May Have a Bug)'

Thread 'RESOLVED - Too Many Total Results / Too Many Errors (May Have a Bug)'

Message boards : Number crunching : RESOLVED - Too Many Total Results / Too Many Errors (May Have a Bug)
Message board moderation

To post messages, you must log in.

AuthorMessage
SekeRob

Send message
Joined: 21 Nov 06
Posts: 20
Credit: 318,377
RAC: 0
Message 41041 - Posted: 15 Nov 2010, 11:59:21 UTC

My client turned into High Priority for the below task, 3rd from top, after not having crunched with this computer under Windows for months. When looking at the header section it comes across as a serious waste of time and electricity. 3 Copies already completed successfully the question being: Should this task be cut short, assuming all results produce a matching simulation if run to completion, or at least not having meaningfull differences.

For now, I've put this task back in stasis awaiting clarification.

cheers


Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 41041 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 41042 - Posted: 15 Nov 2010, 13:27:33 UTC

There are three completions but only one on Windows/Intel. Generally, results from different combinations of operating system and processor will produce slightly different results. The project has established to its satisfaction that such differences amount to small random variations.

My personal rule is to complete any model in a work unit for which there are not two identical completions on the same platform (judged by the graphs). That means that the project could apply traditional BOINC-style validation to the work unit if it wanted.

If, for some reason, I have fallen a long way behind in a work unit for which there are two identical completions on the same platform as my machine then I would abandon that model for the reasons you cite.

Applying my rule to your model, the model should be completed - particularly as it hasn't far to go. However, you might equally take the view that one completion on any platform is good enough.
ID: 41042 · Report as offensive     Reply Quote
old_user92639

Send message
Joined: 13 Aug 05
Posts: 54
Credit: 117,227
RAC: 0
Message 41043 - Posted: 15 Nov 2010, 13:58:20 UTC

24 dec 1973 1:04:25 UTC <= probleme ?

LOL
ID: 41043 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 41044 - Posted: 15 Nov 2010, 14:14:29 UTC - in response to Message 41043.  

24 dec 1973 1:04:25 UTC <= probleme ?

LOL

Quelquefois: ça ne fait rien.
ID: 41044 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 41049 - Posted: 15 Nov 2010, 19:12:10 UTC
Last modified: 15 Nov 2010, 19:17:31 UTC

Yes, the first task in the WU list was on AMD and Linux so its results will be slightly different. The fourth task run by Martin was on AMD and Windows, so again its results will be slightly different. The eighth, like yours, was on Intel + Windows. Several other models have been abandoned by their owners.

Hiro Yamazaki, one of the researchers, told us specifically a few days ago that even if more than one model is completed on the same CPU type + OS, they will all be used. So (particularly as your model has progressed so far) I would complete it.

That red message on the WU is for other Boinc projects, not for CPDN. Ideally it should be removed/hidden. Sometimes it appears whan every model in the WU has crashed or been abandoned ie no results have been received by the project.
Cpdn news
ID: 41049 · Report as offensive     Reply Quote
SekeRob

Send message
Joined: 21 Nov 06
Posts: 20
Credit: 318,377
RAC: 0
Message 41088 - Posted: 19 Nov 2010, 10:17:07 UTC - in response to Message 41049.  
Last modified: 19 Nov 2010, 10:20:57 UTC

Okay, I'll finish it up, but still would appreciate to understand in what way these small variations add to understanding as when different CPUs/OSs generates random variation... if the same model is run twice on the same computer will it be identical or not I wonder :o|

If there is a way to change the OP title to signify conclusion, I'll do that.

edit: seems time limited so if the admin can insert RESOLVED] in OP title it may sink.

thx
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 41088 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 41091 - Posted: 19 Nov 2010, 11:09:07 UTC - in response to Message 41088.  
Last modified: 19 Nov 2010, 11:10:57 UTC

From the Science publications near the front of the project's web site:

C.G. Knight, S.H.E. Knight, N. Massey, T. Aina, C. Christensen, D.J. Frame, J.A. Kettleborough, A. Martin, S. Pascoe, B. Sanderson, D.A. Stainforth, M.R. Allen,
Association of parameter, software and hardware variation with large scale behavior across 57,000 climate models, PNAS, July 2007.

... these small variations add to understanding ...
All just statistics, I guess.
And years of experience as a climate physicist. Which I'm not.

****************************

And from the beta site (question by one of the testers/answer by Hiro):

I've also sometimes wondered whether two or more bit-identical results should be added to an ensemble. It's rather like allowing some people to vote twice in an election. But perhaps the numbers are so large that it makes no practical difference.

We treat the impact of different CPU type and OS as small perturbations.

Most of us, including myself, basically treat each run separately and put a goodness score by comparing it with observational data. Therefore, bit-identical time series will simply receive the same score, ie, plotted twice with the same colour.

If you are talking about attribution studies, I'm not 100% confident about exactly what scientists do, but my guess is that they assume the occurrence of each result is equally likely.
ID: 41091 · Report as offensive     Reply Quote
SekeRob

Send message
Joined: 21 Nov 06
Posts: 20
Credit: 318,377
RAC: 0
Message 41116 - Posted: 21 Nov 2010, 8:20:43 UTC - in response to Message 41091.  

"Completed", a final 1366KB zip file uploaded and taking the 6748 total credit granted, 68 trickles were uploaded and the task marked completed. The FAQs (outdated?) say there should be 72 trickles, which equates exactly to 7145 credit. If 4 trickles went missing it means that in the 4 years, to the day exactly that I've tried to complete even a single model, none (17) ended up correctly. Oh well.

--//--
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 41116 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 41117 - Posted: 21 Nov 2010, 8:33:56 UTC - in response to Message 41116.  

If you're talking about the model that started this thread (k2e2), then yes, there should be 72 trickles.
Slab ocean models have 3 phases of 24 trickles each.


Backups: Here
ID: 41117 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,517,986
RAC: 17,587
Message 41121 - Posted: 21 Nov 2010, 13:36:50 UTC - in response to Message 41116.  

"Completed", a final 1366KB zip file uploaded and taking the 6748 total credit granted, 68 trickles were uploaded and the task marked completed. The FAQs (outdated?) say there should be 72 trickles, which equates exactly to 7145 credit. If 4 trickles went missing it means that in the 4 years, to the day exactly that I've tried to complete even a single model, none (17) ended up correctly. Oh well.

--//--

Guess your result is 10248883. If so, it's now showing-up with all 72 trickles.

It's normal with CPDN that there's a delay between uploading a trickle before it shows-up on the web-pages, and you get credited for the trickle.

The trickle-info is updated in roughly the same way like WCG updates the stats and badges, neither happens instantaneously, but instead only happens every N hours. While WCG updates every 12 hours, I'm not sure if CPDN is currently only updating every 24 hours or something.

ID: 41121 · Report as offensive     Reply Quote
SekeRob

Send message
Joined: 21 Nov 06
Posts: 20
Credit: 318,377
RAC: 0
Message 41122 - Posted: 21 Nov 2010, 15:51:24 UTC - in response to Message 41117.  

Counts out to 72, the last one of 00:23 UTC corresponds to the client message log, offset time, so all is accounted for.

tnx all for comments and observations.

2309 climateprediction.net 21-11-2010 01:19 [sched_op] Starting scheduler request
2310 climateprediction.net 21-11-2010 01:19 Sending scheduler request: To send trickle-up message.
2311 climateprediction.net 21-11-2010 01:19 Not reporting or requesting tasks
2312 climateprediction.net 21-11-2010 01:19 [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
2313 climateprediction.net 21-11-2010 01:19 Started upload of hadsm3fub_k2e2_006460276_6_3.zip
2314 climateprediction.net 21-11-2010 01:19 Scheduler request completed
2315 climateprediction.net 21-11-2010 01:19 [sched_op] Server version 611
2316 climateprediction.net 21-11-2010 01:19 Finished upload of hadsm3fub_k2e2_006460276_6_3.zip
2317 climateprediction.net 21-11-2010 01:19 [sched_op] Starting scheduler request
2318 climateprediction.net 21-11-2010 01:19 Sending scheduler request: To report completed tasks.
2319 climateprediction.net 21-11-2010 01:19 Reporting 1 completed tasks, not requesting new tasks
2320 climateprediction.net 21-11-2010 01:19 [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
2321 climateprediction.net 21-11-2010 01:19 Scheduler request completed
2322 climateprediction.net 21-11-2010 01:19 [sched_op] Server version 611
2323 climateprediction.net 21-11-2010 01:19 [sched_op] handle_scheduler_reply(): got ack for task hadsm3fub_k2e2_006460276_6

Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 41122 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 41127 - Posted: 21 Nov 2010, 19:24:08 UTC

CPDN Tasks and procedures can be a bit of a shock for participants experienced in other projects!

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 41127 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 41129 - Posted: 21 Nov 2010, 23:22:40 UTC
Last modified: 21 Nov 2010, 23:23:07 UTC

Hi again Sekerob

Going back to your question about how tasks from the same workunit can create variations on different computers:

Computers with the same OS (Win, Linux, Mac) and the same CPU type (AMD, Intel) should produce bit-identical results. On CPDN there are 5 computer types:

Win + Intel
Win + AMD
Linux + Intel
Linux + AMD
Mac + Intel

The differences occur because each OS and CPU type handles the maths of the rounding errors slightly differently. In the publication quoted by Les it was found (for a previous model type) that the differences were not important. All computer types produced valid model results.

If a computer is unstably overclocked it may not produce results that are bit-identical to results from the other computers of the same type. A very small % of model results is rejected by quality control; most probably come from badly-overclocked computers.
Cpdn news
ID: 41129 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 41140 - Posted: 22 Nov 2010, 18:10:10 UTC - in response to Message 41129.  
Last modified: 22 Nov 2010, 18:10:31 UTC

In the publication quoted by Les it was found (for a previous model type) that the differences were not important. All computer types produced valid model results.

I wonder if different random number seeds would make a difference. In theory, it should not, but does CPU/OS diffs + random seed diffs = significant difference?
ID: 41140 · Report as offensive     Reply Quote

Message boards : Number crunching : RESOLVED - Too Many Total Results / Too Many Errors (May Have a Bug)

©2024 cpdn.org