climateprediction.net (CPDN) home page
Thread 'Why do I keep getting a 'Computation Error'?'

Thread 'Why do I keep getting a 'Computation Error'?'

Message boards : Number crunching : Why do I keep getting a 'Computation Error'?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user169398

Send message
Joined: 27 Feb 06
Posts: 2
Credit: 86,021
RAC: 0
Message 46982 - Posted: 7 Sep 2013, 15:06:14 UTC

Most times, this program takes about 500 hours to complete. For the past few work units, when only 100 hours are left, it stops, and says 'computation error'. Any idea why it keeps doing this? All other projects work fine. I copied/pasted the event log from when this occurred:

9/7/2013 4:26:34 AM | climateprediction.net | Sending scheduler request: To send trickle-up message.
9/7/2013 4:26:34 AM | climateprediction.net | Not requesting tasks: don't need
9/7/2013 4:26:38 AM | climateprediction.net | Scheduler request completed
9/7/2013 4:26:43 AM | climateprediction.net | Computation for task hadcm3n_47ys_2020_40_008393219_4 finished
9/7/2013 4:26:43 AM | climateprediction.net | Output file hadcm3n_47ys_2020_40_008393219_4_3.zip for task hadcm3n_47ys_2020_40_008393219_4 absent
9/7/2013 4:26:43 AM | climateprediction.net | Output file hadcm3n_47ys_2020_40_008393219_4_4.zip for task hadcm3n_47ys_2020_40_008393219_4 absent

Thanks for any input.

Nathan
ID: 46982 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 46983 - Posted: 7 Sep 2013, 15:49:39 UTC
Last modified: 7 Sep 2013, 15:52:11 UTC

Which one of your computers is it? (Could you post a link to both the computer & the task in quest) There is one with a lot of 'aborted by user'...




Task ID
click for details
Show names Work unit ID
click for details Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Claimed credit Granted credit Application
15999324 8540636 1 Sep 2013 15:45:42 UTC 5 Sep 2013 4:51:14 UTC Aborted by user 11,918.64 11,759.16 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15973944 8591527 30 Aug 2013 9:33:20 UTC 1 Sep 2013 15:45:42 UTC Aborted by user 43,380.69 32,350.35 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15928546 8557817 20 Aug 2013 9:18:11 UTC 30 Aug 2013 9:33:20 UTC Aborted by user 130,565.45 107,276.60 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15925569 8339457 18 Aug 2013 22:03:05 UTC 19 Aug 2013 1:48:48 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P Pacific North West v6.09
15855491 8540503 23 Jun 2013 14:15:45 UTC 16 Aug 2013 6:55:51 UTC Aborted by user 583,111.53 450,435.70 0.00 933.12 UK Met Office Coupled Model Full Resolution Ocean v6.07
15815532 8474790 1 Jun 2013 11:06:45 UTC 16 Jun 2013 20:41:25 UTC Aborted by user 58,233.55 48,268.72 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15798768 8473794 27 May 2013 11:36:30 UTC 1 Jun 2013 1:15:04 UTC Aborted by user 13,557.45 13,256.55 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15796716 8469378 26 May 2013 12:06:13 UTC 27 May 2013 5:45:59 UTC Aborted by user 5,081.69 4,825.03 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 46983 · Report as offensive     Reply Quote
old_user169398

Send message
Joined: 27 Feb 06
Posts: 2
Credit: 86,021
RAC: 0
Message 46984 - Posted: 7 Sep 2013, 15:56:44 UTC - in response to Message 46983.  
Last modified: 7 Sep 2013, 15:59:25 UTC

The computer with the issue is the one with ID: 1047569 running Windows 7.

The other computer is aborted often when it d/l data, and says it won't complete until after the deadline, even if it's run 24/7 (that computer is the family computer). It is never ran 24/7 since three other people use it. I should just remove Climate from that one. Thanks for the reply.

The link to the computer is:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1047569

The link to the most recent task is:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=15802084
ID: 46984 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47034 - Posted: 13 Sep 2013, 5:56:50 UTC - in response to Message 46984.  
Last modified: 13 Sep 2013, 6:02:18 UTC

The computer with the issue is the one with ID: 1047569 running Windows 7.

The other computer is aborted often when it d/l data, and says it won't complete until after the deadline, even if it's run 24/7 (that computer is the family computer). It is never ran 24/7 since three other people use it. I should just remove Climate from that one. Thanks for the reply.

The link to the computer is:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1047569

The link to the most recent task is:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=15802084



What I am seeing from this crash is: exit code 193, and signal 11. The other crashes on the same PC seem similar. The following thread is from someone who was having the same combination of error codes & seems to have fixed them now. The two things that he did was to exclude the temporary directories from the A/V scan, and he replaced an elderly disk drive.
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7602&nowrap=true#46964


Also, I can see that your models in 2012 were running OK, but it started crashing in 2013. Can you think of anything which changed then?



(unknown error) - exit code 193 (0xc1)
...
andled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x77E843D0 read attempt to address 0x40CDE394

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x77E83AB3 read attempt to address 0x40CDE390

Engaging BOINC Windows Runtime Debugger...

Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o6zx_2020_40_008373608/dataout/shmem_restart.day
Signal 11 received, exiting...
Called boinc_finish



Could I suggest the following as a starting point:

* Change your settings to 'Leave tasks in memory when suspended' = Y, 'suspend if CPU usage is above %' to 0%, 'Use at most ... % of CPU' to 100.00. This will prevent the model being swapped out of memory.

* Make sure you shut down Boinc first prior to shutting down windows (right-click on the Boinc icon, snooze, wait for a few seconds, then right-click and exit). Similarly if you are about to do something CPU intensive, such as gaming, put it into snooze mode.

* Make sure that the Boinc data directories, and also temporary directories (c:\temp, c:\windows\temp, c:\users\your-user-id\appdata\local\temp or whatever) are excluded from any antivirus scans




Feel free to ask for help if you have trouble with any of these. Once you've made the change, monitor it for a while to see if it has helped or not. Either way, we would appreciate knowing the outcome.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47034 · Report as offensive     Reply Quote
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 48149 - Posted: 11 Feb 2014, 14:04:05 UTC

It may not correspond with your error but I live in an area that gets power cuts. They are not that frequent (say once a month or so) and they don't last long, usually a minute max, however they do cause Computational Error on the Climate Prediction application every damn time.

Does anyone know if there is there any way that I can set up BOINC so that CP will restart at the last clean-point or something?

Bill H
ID: 48149 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 48150 - Posted: 11 Feb 2014, 14:39:08 UTC


When my power supply was dropping out regularly, I bought an old APC 2200VA UPS off ebay which solved the problem. It did of course need a set of new batteries (that was the expensive bit, old UPS units themselves tend to be pretty cheap).

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 48150 · Report as offensive     Reply Quote
ProfilePhil

Send message
Joined: 17 Apr 14
Posts: 5
Credit: 1,709,304
RAC: 0
Message 50951 - Posted: 9 Dec 2014, 11:20:11 UTC

I keep getting a message on my screen in relation to project/task hadam3p_pnw_graphics_7.22_windows_intelx86.exe that tells me that the program can't start because MSVCR100.dll is missing from my computer. I have checked and it is on the computer- under BOIMC and also under Java. So my screen fills up with black screen files, like a stack of sheets of paper. How do I solve this?
ID: 50951 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50965 - Posted: 12 Dec 2014, 2:42:58 UTC - in response to Message 50951.  

MSVCR100.dll is a Microsoft library file. It'll be missing from your basic Microsoft file stuff.
Do a search of the internet for it. There's bound to be answers there.

ID: 50965 · Report as offensive     Reply Quote

Message boards : Number crunching : Why do I keep getting a 'Computation Error'?

©2024 cpdn.org