climateprediction.net (CPDN) home page
Thread 'Error while computing???'

Thread 'Error while computing???'

Message boards : Number crunching : Error while computing???
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,824,485
RAC: 4,956
Message 59477 - Posted: 21 Jan 2019, 7:26:20 UTC - in response to Message 59472.  
Last modified: 21 Jan 2019, 7:26:38 UTC

Got segment violation errors on tasks from batches 777 and 780. Both appear to be after 9th zip file as zips from 10 onwards are not generated.

I have finished multiple 777/780, but maybe there are some duff ones or there was some local difficulty - those batches look fine, as far as I know.
ID: 59477 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 59478 - Posted: 21 Jan 2019, 12:17:42 UTC

After aborting the 781's I received three more of them.


Hopefully that will be it. Sarah is closing this batch with the abort function now. At least one moderator has a couple still running and so will be able to let Sarah know that it has worked.
ID: 59478 · Report as offensive     Reply Quote
mngn

Send message
Joined: 13 Jul 18
Posts: 38
Credit: 62,933,508
RAC: 84,702
Message 59479 - Posted: 21 Jan 2019, 12:32:32 UTC - in response to Message 59477.  

Some crunchers are unknowingly crashing their WUs by doing too many Suspends.
ID: 59479 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59481 - Posted: 21 Jan 2019, 13:36:58 UTC - in response to Message 59478.  

I just got a 781 and aborted it.
ID: 59481 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 59482 - Posted: 21 Jan 2019, 16:19:40 UTC - in response to Message 59481.  

I just got a 781 and aborted it.


I think the abort signal may have now gone out as the number of tasks in progress has just dropped by about 8,000 https://www.cpdn.org/cpdnboinc/server_status.php
ID: 59482 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 59484 - Posted: 21 Jan 2019, 19:53:26 UTC - in response to Message 59479.  

Some crunchers are unknowingly crashing their WUs by doing too many Suspends.


In my experience, suspending is not a problem if the option "Leave non-GPU tasks in memory while suspended" is set in the account preferences. I've never had a problem with that setting and I suspend one computer quite regularly and with this option set, the tasks also survive shutdown.

I guess it's a case of how to get the message across.
ID: 59484 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 59486 - Posted: 21 Jan 2019, 20:57:41 UTC - in response to Message 59482.  

I just got a 781 and aborted it.


I think the abort signal may have now gone out as the number of tasks in progress has just dropped by about 8,000 https://www.cpdn.org/cpdnboinc/server_status.php

The number of different task types listed in progress were reduced from 9 to just 4. See the graphs: http://ob.cakebox.net/cpdn_status/server_status.html
Maybe they just cleaned the database of obosolete applications.
ID: 59486 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59487 - Posted: 21 Jan 2019, 21:05:01 UTC

Yes, that's something that was talked about last year. About time it happened.

And I think that the link to that page needs renaming on the Main page, from Server status to Project status, seeing as how the servers haven't been listed there for several years.

However, batch 781 is now listed as closed, so that should put a stop to those problems, provided everyone's computers contact the server to get the kill message.
ID: 59487 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,824,485
RAC: 4,956
Message 59493 - Posted: 22 Jan 2019, 15:49:50 UTC - in response to Message 59487.  

My two running 781 models have uploaded Zips and not been killed, so I've now aborted them both ...
ID: 59493 · Report as offensive     Reply Quote
Iceberg

Send message
Joined: 28 Dec 17
Posts: 18
Credit: 1,097,261
RAC: 147
Message 59510 - Posted: 25 Jan 2019, 21:56:57 UTC

Hi all.

What does this error message mean? This WU was functioning perfectly for a long time and suddenly ended with a computation error.

"Signal 11 received: Segment violation
Signal 11 received: Software termination signal from kill
Signal 11 received: Abnormal termination triggered by abort call
Signal 11 received, exiting...
15:24:03 (9176): called boinc_finish(193)
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8928, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6988, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_ain::Monitor...
15:25:09 (6988): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_11.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_12.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_13.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_14.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_15.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_16.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_safr50_a0ax_201412_16_779_011704417_1_r358526244_restart.zip</file_name>
<error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>"
ID: 59510 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 59511 - Posted: 25 Jan 2019, 22:21:08 UTC - in response to Message 58430.  
Last modified: 25 Jan 2019, 22:24:06 UTC

seg faults are a problem with the task rather than your computer.
ID: 59511 · Report as offensive     Reply Quote
Iceberg

Send message
Joined: 28 Dec 17
Posts: 18
Credit: 1,097,261
RAC: 147
Message 59512 - Posted: 25 Jan 2019, 22:28:21 UTC - in response to Message 59511.  

Thanks for the quick reply. Glad to hear the computer seems to be functioning properly in BOINC.
ID: 59512 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Error while computing???

©2024 cpdn.org