climateprediction.net (CPDN) home page
Thread 'WU Unrecoverable error root cause'

Thread 'WU Unrecoverable error root cause'

Questions and Answers : Unix/Linux : WU Unrecoverable error root cause
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user155703

Send message
Joined: 30 Jan 06
Posts: 2
Credit: 84,218
RAC: 0
Message 19819 - Posted: 31 Jan 2006, 13:30:38 UTC

With boinc 5.2.13 and SETI I have a short but successful history. For prediction I have a shorter but totally unsuccessful history with 3 WU begun and all 3 failed within say a days time {file_xfr_err...}. I see from this forum I am not alone in having these errors, but I do not find a solution has been implimented nor do I find a pattern in the reports.

My temp solution has been to prevent new work. My question is do I need to modify something in my Linux FC-4 platform? How long should I wait before resuming work on prediction?

In further reading I see also some concern for large WU size, in my case a predicted 3-month 2400hr committment to one WU. I share in the concern of beginning a WU but being unable to complete it say after many days of work. Are statistics available to demonstrate completion success for all WU distributed by the project?

Regards
Skip
ID: 19819 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19829 - Posted: 31 Jan 2006, 20:04:45 UTC

No answer to your failures, except to suggest you look at yabsd.out as per the other answers.

> Are statistics available to demonstrate completion success for all WU distributed by the project?

The front page has some stats on the number of completed models, and about a year ago some of the regular \'users\' did some research on the finished /failed ratio.

Unlike, say, SETI, where all data searched needs to be completed, here it\'s statistics. Only a proportion of data sets, spread over a wide area, (or at least, a wide area of current interest to the researchers), needs to be completed.
It\'s like a giant jigsaw puzzle. Up close, missing pieces are obvious, and the nature of the picture hard to perceive. But move away, and the overall nature of the image becomes clear.
Or, if you have nine squares together, forming a 3x3 sauare, and they are all blue, then if the centre square is missing, it\'s a good guess that it too is blue.

ID: 19829 · Report as offensive     Reply Quote
Profileold_user155703

Send message
Joined: 30 Jan 06
Posts: 2
Credit: 84,218
RAC: 0
Message 19860 - Posted: 1 Feb 2006, 16:41:37 UTC - in response to Message 19829.  

Thanks Les for your thoughtful reply,

No answer to your failures, except to suggest you look at yabsd.out as per the other answers.


Presently my model is empty, thus no yabsd.out to look at. More significantly it looks my errors are found also by others. If the programmers on the team are actively chasing my \"bug\" I\'m wanting to be helpful, but if my FC-4 platform is too far from the mainstream to be of interest then I should move on (if in fact the problem is platform specific).

I did not find the specific result, but the stats are interesting indeed.

I\'m OK with neighborhoods from the project perspective, from my perspective if others can successfully process a given WU while I can not, then it is foolhearty for me to continue down that path?

Thanks for attention to this small detail...
ID: 19860 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19869 - Posted: 1 Feb 2006, 20:37:03 UTC

FC-4 is OK as far as I know.
It\'s just that the software is huge; the source code is over 50Megs, and there are over a million lines of it. And the compiled result has to produce models which are consistant with the models produced by the original 64 bit code running on the Met\'s supercomputers.
You can see <a href=\"http://www.meto.gov.uk/research/nwp/numerical/computers/index.html\"> here</a> what a few million can buy, as well as the machines that have preceeded them.

Also, the two programmers are a bit tied up at present getting ready for the next phase, experiment 2.
But they are working on it, and, hopefully, a Mac version as well, on their \'new\' Mac.

Not using Linux myself, I can only pass on what others have said about problems, but if you also run other projects, I\'d suggest that you concentrate on them for a while, and look back here in a few weeks to see if the new models have started.

ID: 19869 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : WU Unrecoverable error root cause

©2024 cpdn.org