Message boards : Number crunching : error report for wah2_sas50
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
SAS50 workunits 'break down' after a few seconds. <stderr_txt> Model crashed: INANCILA:integer header error tmp/xadae.pipe_dummy Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=15784, selfPID=15784, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=15784, selfPID=16112, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_ain::Monitor... 15:33:37 (16112): called boinc_finish(0) |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Have you had a bunch of them do so? Are any tasks that you might have downloaded today still running? They think there's a corruption in a restart file (or restart files) but are unsure how widespread it is. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
No all workunits crashed so far. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
I see that 5 on Bonsai's computer have crashed so far. https://www.cpdn.org/cpdnboinc/results.php?hostid=1377284 When a second one of mine crashed I had to wait an hour for the timeout before it would show as crashed on its page. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I see that 5 on Bonsai's computer have crashed so far. https://www.cpdn.org/cpdnboinc/results.php?hostid=1377284 You can update the project in boinc manager to show the crashed task immediately on the webpage. Of course if you are wanting it to request more tasks, that resets the communication time to 1 hour. I've been doing that since it appears that I am having no luck downloading a good task, and now I've "fulfilled my daily quota" for both PCs. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
That's right. I see that 5 on Bonsai's computer have crashed so far. But I received 5 workunits more, and they all also crashed after 13 or 14 seconds. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
You can update the project in boinc manager to show the crashed task immediately on the webpage. Should have thought of that! Having also crashed my daily quota machine is now crunching other projects on one of its two cores. I am keeping my marginally faster machine running native Linux work in the hope that at some stage testing might lead to main site work for it again. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
Nothing new: The next five workunits crashed. 15 wu Model crashed: INANCILA:integer header error |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
Nothing new: The next five workunits crashed. Thanks, If anyone has any of these tasks (batches 703,704 and 705) running past the first few seconds (13) on my box without crashing can you please let us know. Unsent tasks may be withdrawn but if some are working OK the withdrawn ones will be re-issued as that would mean it was a science issue rather than a purely dodgy xml file one. Edit: unsent tasks have been paused for now. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
I've got two of these batch 705, they ran for 30 secs and crashed with INANCILA:integer header error. (running WINE, BOINC 7.8.3) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
I've got two of these batch 705, they ran for 30 secs and crashed with INANCILA:integer header error. (running WINE, BOINC 7.8.3) 17 seconds longer than my 703s managed! Priority is anyone with them not crashing please post! |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,985,838 RAC: 14,284 |
Just had a 704 crash after about 12 seconds. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
No reports of these tasks running. Team at Oxford are investigating the problem. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
Problem identified and when correct files are uploaded to the system the batches will go out again. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
These have started pouring into the hopper and one from batch 706 is now about 8 minutes in, well past the time when they were crashing before due to some misonfigured files. |
©2024 cpdn.org