climateprediction.net (CPDN) home page
Thread 'Tons of errors: error code -240?'

Thread 'Tons of errors: error code -240?'

Message boards : Number crunching : Tons of errors: error code -240?
Message board moderation

To post messages, you must log in.

AuthorMessage
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 60066 - Posted: 3 May 2019, 22:35:04 UTC

Hopefully this is going in the right place.
I'm not sure what's going on here.
This computer, from boinc's perspective, seems to be doing just fine. There are currently two Tasks that have been running for days and don't seem to be stalling. Did it just take CPD a while to figure out what would work with my machine?
https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1484652
In reading the faq I'm being told that the machine can be disconnected from the internet for prolonged periods. So this is confusing to me.
upload failure: <file_xfer_error>
<file_name>wah2_sam25_s3hj_200312_13_719_011512003_2_r1565019525_1.zip</file_name>
<error_code>-240 (stat() failed)</error_code>

https://www.cpdn.org/cpdnboinc/result.php?resultid=21626096

I have it set to 3 cpus to prevent excessive ram usage. It does have 6 gb which in theory should be enough for the smaller tasks.

I just Want to make sure I'm actually contributing rather than wasting CPU cycles.
any help appreciated here. Is this normal?
thanks
ID: 60066 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60067 - Posted: 4 May 2019, 2:15:43 UTC - in response to Message 60066.  

The "Signal 11 received: Segment violation" is the key part of the error message, and just says that something went wrong, probably in the software. The "upload failure: <file_xfer_error>" usually just means that an expected file was absent (probably because the work unit failed).

So I don't see anything wrong with your machine. We all see error messages like that. Keep crunching. If you don't get success in three or four more work units, come back for help.
ID: 60067 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 60068 - Posted: 4 May 2019, 2:46:02 UTC

Some batches have higher frequency of Signal 11 errors, and some of those SAM25 batches in particular It's doubtful that it was due to a problem with your computer.
ID: 60068 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 60069 - Posted: 4 May 2019, 2:48:57 UTC - in response to Message 60067.  

The "Signal 11 received: Segment violation" is the key part of the error message, and just says that something went wrong, probably in the software. The "upload failure: <file_xfer_error>" usually just means that an expected file was absent (probably because the work unit failed).

So I don't see anything wrong with your machine. We all see error messages like that. Keep crunching. If you don't get success in three or four more work units, come back for help.

Thanks!
As an aside, what should I be looking for to know what batch and what length of wu I'm crunching out of curiosity?
I'm assuming for instance that sam50 is South America 50 km, but I just want to make sure.
ID: 60069 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60070 - Posted: 4 May 2019, 3:15:44 UTC

Using the model name that you mentioned:
wah2_sam25_s3hj_200312_13_719_011512003_2_r1565019525_1.zip

200312 is the start year/month
13 is the number of months that it's intended to run
719 is the batch number, useful for quickly identifying a model type
2 is the number of attempts, starting with zero (0) (So the one that you had was on it's last chance.)

The 1 just before the ".zip" is the data file number that's trying to get back to the researchers.
Or, in this case, NOT.

That long list of file failures is just BOINC saying that it can't find them to upload them.
Which is because that model failed after 7 minutes, long before any useful amount of data could be created, and zipped up ready to return.

I'm assuming for instance that sam50 is South America 50 km, but I just want to make sure.

That's correct.

Some times here, just lots of people not being able to get a batch of models to run is "useful" information for the researchers.
ID: 60070 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 60071 - Posted: 4 May 2019, 14:31:03 UTC

There's a bunch of region map images that Iain Inglis posted in this thread.

https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8258#54659
ID: 60071 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 60073 - Posted: 4 May 2019, 23:31:34 UTC - in response to Message 60070.  

Using the model name that you mentioned:
wah2_sam25_s3hj_200312_13_719_011512003_2_r1565019525_1.zip

200312 is the start year/month
13 is the number of months that it's intended to run
719 is the batch number, useful for quickly identifying a model type
2 is the number of attempts, starting with zero (0) (So the one that you had was on it's last chance.)

The 1 just before the ".zip" is the data file number that's trying to get back to the researchers.
Or, in this case, NOT.

That long list of file failures is just BOINC saying that it can't find them to upload them.
Which is because that model failed after 7 minutes, long before any useful amount of data could be created, and zipped up ready to return.

I'm assuming for instance that sam50 is South America 50 km, but I just want to make sure.

That's correct.

Some times here, just lots of people not being able to get a batch of models to run is "useful" information for the researchers.

Thank you! This is exactly what I was looking for.
I've actually written all of this down in case it happens again.

What WUs tend to run the longest, out of curiosity? Especially under Windows, since that is my primary.
thanks!
ID: 60073 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 60074 - Posted: 5 May 2019, 4:49:27 UTC - in response to Message 60073.  

What WUs tend to run the longest, out of curiosity? Especially under Windows, since that is my primary.
thanks!

Usually anything with a 25 km resolution and the more model months, the longer it will take so a 25 month SAM25 or EU25 would take longer than a SAM50 or EU50 for the same number of model months.
ID: 60074 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 60075 - Posted: 5 May 2019, 5:53:22 UTC - in response to Message 60074.  

Usually anything with a 25 km resolution and the more model months, the longer it will take so a 25 month SAM25 or EU25 would take longer than a SAM50 or EU50 for the same number of model months.

Thanks! Much appreciated.
Getting a Lenovo x1 Extreme with the i7-8750H. That should be able to do some pretty nice work, hopefully. Pretty excited for it to arrive next week.
ID: 60075 · Report as offensive     Reply Quote

Message boards : Number crunching : Tons of errors: error code -240?

©2024 cpdn.org