Message boards : Number crunching : Ocean model crashed.
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Work Unit hadcm3n_t5wx_1980_40_007414564_0 crashed after timestep 259,200. Reason unknown. The Stderr is shown below. core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4760, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77163A93 read attempt to address 0x40E476BC Engaging BOINC Windows Runtime Debugger... No Process Handle Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4312, selfPID=4312, iMonCtr=1 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x76F57353 read attempt to address 0xFFFFFFF8 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_t5wx_1980_40_007414564/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> Can anyone make sense of this? What is an Access Violation? What does Signal 11 mean? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Wiki article on signal 11. There's a lot of Suspends. 2 possibilities: 1) Something, possibly the anti virus, is blocking access to some file at a critical moment 2) You have the 'newish' BOINC option to stop processing when other program usage is high, still set to the default of 25%. Or perhaps one of the several other 'slow down' options is/are being used. This could affect the programs, which don't like being interrupted. Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I doubt that it is the Norton Antivirus that is causing the problem. I excluded from scans the Boinc folders in both Programs and in the ProgramData folders. I did this in both regular scans and in the so-called Sonar competent. edit: Your right, the stop work if the CPU usage is to high was set at 25%. I have reset it to 0. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Dave, I have only just got round to looking at integer benchmark scores for other computers and I now know just how huge the score is. What a shame that isn't reflected in the speed I get through work units! Dave |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
This model http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=13367975 crashed at 0 time with an error I have not seen before "INITTIME: Ocean basis time mismatch". <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Model crashed: INITTIME: Ocean basis time mismatch Model crashed: INITTIME: Ocean basis time mismatch Model crashed: INITTIME: Ocean basis time mismatch Model crashed: INITTIME: Ocean basis time mismatch Model crashed: INITTIME: Ocean basis time mismatch Model crashed: INITTIME: Ocean basis time mismatch Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Hi 3rkko, I had a couple of these, too. Some of the yxxx series were not configured correctly, unfortunately. Fortunately, they crash straight away, so the only loss is the cost of the download, not weeks of CPU time. :) |
©2024 cpdn.org