climateprediction.net (CPDN) home page
Thread 'Compute Errors: Output File Absent'

Thread 'Compute Errors: Output File Absent'

Message boards : Number crunching : Compute Errors: Output File Absent
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46663 - Posted: 21 Jul 2013, 22:49:14 UTC

I've never really understood these errors, but my hosts have returned three compute errors recently:

Task 15898454 (Linux64/AMD)

Task 15897301 (Win7-64/Intel)

Task 15897286 (Win7-64/Intel)

The dreaded "output file absent" message showed up in the BOINC manager in each case. Is this a problem with the host or workunit setup? Something else?

Thanks,

MarkR
ID: 46663 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 46666 - Posted: 21 Jul 2013, 23:01:40 UTC
Last modified: 21 Jul 2013, 23:03:02 UTC

The "output file absent" message is more of a comment than an error. The model data includes a list of all the files that will be uploaded back to the CPDN server by the time the model completes. When a model finishes a note is made of any files that were not uploaded - and that is the "output file absent" message.

The actual cause of the model crash is usually recorded in the stderr section of the task page. In the case of the models you list, there seem to be memory errors or access violations. The first thing to check is that the BOINC data folder is excluded from virus scanning, since that is a common cause of access violations.
ID: 46666 · Report as offensive     Reply Quote
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46669 - Posted: 22 Jul 2013, 1:50:16 UTC - in response to Message 46666.  

The first thing to check is that the BOINC data folder is excluded from virus scanning, since that is a common cause of access violations.

Yeah...Whenever I set up a host for BOINC, that's the first thing I do. I confirmed that they are still set up that way, just to make sure.

ID: 46669 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 46670 - Posted: 22 Jul 2013, 8:30:21 UTC

On closer inspection, no-one has made any progress on any of those work units. There were some duff work units issued around that time and these are all re-issues from the same time (17/18-Apr-13). Nothing to do on your side ...
ID: 46670 · Report as offensive     Reply Quote
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46672 - Posted: 22 Jul 2013, 12:50:54 UTC - in response to Message 46670.  

On closer inspection, no-one has made any progress on any of those work units...

Ah, okay, I see that now. It looks like many of the tasks I have in progress are in the same boat. I've got some tasks that were created 6-9 months ago and have had multiple compute errors before I got them.

Am I wasting my time on those? I don't mind them erroring out if some worthwhile work is being done. But, without assurance that trickle-ups are being recorded (and credit granted...ha-ha) I don't know.
ID: 46672 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 46673 - Posted: 22 Jul 2013, 13:03:20 UTC - in response to Message 46672.  

That's a slightly tricky one, since it would be a pity to abort a viable model that someone else just happened to crash. But if the other models in the work unit have failed with the "out of memory" error, then abort your copy as it'll do the same ...

The delay in reporting trickles is just that - a delay. Trickles are being uploaded and will eventually be reported when the appropriate task runs server-side. Since credits are based on trickles, credits will then start being awarded at that point.
ID: 46673 · Report as offensive     Reply Quote
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46675 - Posted: 22 Jul 2013, 15:47:05 UTC - in response to Message 46673.  

Tricky, indeed. I've checked a few of my predecessors and error codes are all over the place. I wouldn't be confident of what to go on.

It would be nice if we could be given an idea of how much things are backed up.
ID: 46675 · Report as offensive     Reply Quote

Message boards : Number crunching : Compute Errors: Output File Absent

©2024 cpdn.org