climateprediction.net (CPDN) home page
Thread 'Where do all the errors come from?'

Thread 'Where do all the errors come from?'

Message boards : Number crunching : Where do all the errors come from?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 31337 - Posted: 12 Nov 2007, 19:22:53 UTC

This may be a stupid question.

A model is running, with Network Activity Suspended. Backups are being taken regularly. It crashes, as they do occasionally. A Backup is restored and crunching recommences. Networking is only ever turned on, briefly, to let a trickle or decadal zip file upload and then turned off again.

How then does the Server come to know about errors and list them on the model\'s Result page, given that the crash occurred, the Backup restored and recovery obtained all without communication with the network?

Puzzled.
ID: 31337 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 31338 - Posted: 12 Nov 2007, 19:39:50 UTC
Last modified: 12 Nov 2007, 19:44:58 UTC

Are the reported errors the ones which led to the fatalities? Or the usual litany we see from recent boinc versions?

There should be no way for \'knowledge\' of an error/crash to carry over when a backup is restored. Copies of stderr/stdoutdae.txt and client_state.xml are returned to their pre-crash condition...

Can you point us to a specific case?

Edit: I\'m assuming that you mean \'restore the entire boinc folder\' when you say restore Backups. (Piecemeal attempts to \'restore\', problematic at best, could leave old Trickles...)

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 31338 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31339 - Posted: 12 Nov 2007, 20:38:59 UTC
Last modified: 12 Nov 2007, 20:40:04 UTC


I guess it may be this result:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6622010


How are you restoring from the backup, are you copying the backup over the top of the normal folder, or are you renaming folders?

The reason I ask is that if you simply copy the backup folder over the top, files which exist in the original boinc folder, but not in the backup folder are left intact. This includes result uploads which tell the servers that the model crashed.

I prefer to rename folders so that they stay separate. Of course, it doesn\'t actually matter whether the server thinks the model crashed or not, because the trickle uploads are the important thing from the point of view of the scientists (the model\'s status is ignored).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31339 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31340 - Posted: 12 Nov 2007, 21:22:02 UTC


In my how-to-backup, there are 2 things that are important:
1. If copying \'over the top\' of the original, ALWAYS DELETE THE ORIGINAL FIRST.
2. ALWAYS re-boot the computer afterwards; otherwise you\'ll still have all of the old info stored in ram, and some may be faulty.

ID: 31340 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 31343 - Posted: 13 Nov 2007, 8:27:37 UTC

Yes, http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6622010 is the one that causes me to ask the question.

How do I do it?
I close down BOINC (File > Exit)
I delete all files and folders from c:\\BOINC
I copy all files and folders from my backup copy and paste them into c:\\BOINC
I confess I have not been rebooting at this point
I simply restart BOINC
ID: 31343 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31345 - Posted: 13 Nov 2007, 11:54:27 UTC
Last modified: 13 Nov 2007, 11:55:37 UTC

I\'d make the following modifications to your procedure...


How do I do it?
I close down BOINC (File > Exit)
I delete all files and folders from c:\\BOINC

Rename BOINC to BOINC_old

I copy all files and folders from my backup copy and paste them into c:\\BOINC

Copy - and - paste your backup directory to c:\\, then rename it to BOINC

I confess I have not been rebooting at this point
I simply restart BOINC


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31345 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31346 - Posted: 13 Nov 2007, 16:11:57 UTC


A couple of years back, I too wasn\'t rebooting, and couldn\'t work out why a string of backups weren\'t working. (I keep them on a different partition, and just keep adding to them until space gets short before deleting VERY old backups.)

Then I worked out that BOINC, (\"kept in memory\"), must be able to use this kept data when restarted. So I rebooted to flush the \'bad\' data, and the backups all started working.

ID: 31346 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 31347 - Posted: 13 Nov 2007, 18:01:38 UTC - in response to Message 31346.  


A couple of years back, I too wasn\'t rebooting, and couldn\'t work out why a string of backups weren\'t working. (I keep them on a different partition, and just keep adding to them until space gets short before deleting VERY old backups.)

Then I worked out that BOINC, (\"kept in memory\"), must be able to use this kept data when restarted. So I rebooted to flush the \'bad\' data, and the backups all started working.



That\'s v interesting and, though it seems to defy some logic, I\'ll follow that course. I assume \"rebooting\" means rebooting the whole machine, not just BOINC and its apps?
ID: 31347 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31348 - Posted: 13 Nov 2007, 18:41:04 UTC


Yup.

I usually reboot each time I backup Boinc simply because my PC starts to go slow and then crashes if it is up too long (I think it\'s due to a memory leak in VSMON - my virus checker).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31348 · Report as offensive     Reply Quote
old_user479742

Send message
Joined: 29 Oct 07
Posts: 4
Credit: 39,104
RAC: 0
Message 31352 - Posted: 14 Nov 2007, 2:08:31 UTC - in response to Message 31348.  


Yup.

I usually reboot each time I backup Boinc simply because my PC starts to go slow and then crashes if it is up too long (I think it\'s due to a memory leak in VSMON - my virus checker).



I am confused. You all seem to know your BOINC and climate prediction very well..... My question is a simple one...

Why does it say computation error? I did 75 hours on one 825 hour file and it simply said computation error....

Is it my pc or simply the file that a received was flawed?

Also is there any chance of over heating on my pc if I leave it on all the time with boinc running at 50% of CPU time in the background? I have a 2.8 Dell dimension 92000 with 4 gb of ram.


THANKS!
ID: 31352 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 31353 - Posted: 14 Nov 2007, 2:52:26 UTC
Last modified: 14 Nov 2007, 3:34:11 UTC

Well, I had a lengthy reply composed, hit \'Post reply\' -- and next saw the download page. Mumble, mumble. (Let\'s hear it for the boinc BB!)

Without redoing the entire thing (as though I could):
You have 3GB of memory. Not usual. Likely 2*1GB plus 2*512MB. Same manufacturer? Same timings?

Vista: Where is your boinc folder? If in C:\\Program Files, that\'s a problem. Please put it anywhere else. D:\\boinc would be good. (I format my hard disks to give boinc its own Partition. This has advantages.)

You lost two Models with similar errors. \'22\' is a catch-all error and tells us nothing useful. Did you install the latest graphics drivers?

Is the box overclocked (not sure it can be done on a Dell)?

Do you run heavy-resource progs like games or video editing with boinc/CPDN active?

I forget whet else I wrote...

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 31353 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31358 - Posted: 14 Nov 2007, 5:00:32 UTC


Overheating.
Hmmm. Dell - made for a price. Probably a minimum of case cooling, perhaps just the power-supply-unit fan. So minimum air flow through the case to cool the processor.
Living in Canade, so room temps should be getting low.
Which leaves the cpu heatsink: is it dust free? Dust acts as an insulator, so the processor heat can\'t escape.

But lots of us leave our computers on 24/7, running full processor power. :)

As for the models that failed: this can be because of something going wrong with the computer, or it can indeed be the dataset for the model; sometimes the combination of values used for the model can result in it becoming unstable, so the model will then crash. One such instance is the well-known \"Negative pressure\".

This failure before the end target year is part of the experiment; the researchers want to know which combinations cause it, and there\'s only one way to find out.

PS
Bad luck Astro. I keep thinking that I should \"Ctrl C\" my posts first, but I always forget.


Backups: Here
ID: 31358 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31367 - Posted: 14 Nov 2007, 14:14:48 UTC
Last modified: 14 Nov 2007, 14:15:08 UTC

This bit was buried at the far end of the error log. While it\'s not very clear, it might be due to the CPU being used by something else for an extended period of time.

CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(


This is the kind of error which backups can solve (see the \'backups and restores\' readme via the link in my signature).

If you are going to be running something which uses the CPU on the PC for an extended period of time (such as a game, video compression / ripping, etc), then I\'d suggest exiting from Boinc first. Just right-click on the icon and select \'exit\'.

I\'d also recommend scanning through the other readmes at the same time to see if there is anything of interest.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31367 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 31384 - Posted: 15 Nov 2007, 8:17:29 UTC - in response to Message 31367.  



This is the kind of error which backups can solve (see the \'backups and restores\' readme via the link in my signature).

If you are going to be running something which uses the CPU on the PC for an extended period of time (such as a game, video compression / ripping, etc), then I\'d suggest exiting from Boinc first. Just right-click on the icon and select \'exit\'.

I\'d also recommend scanning through the other readmes at the same time to see if there is anything of interest.


Thanks, MikeMars. Yes, I\'m pretty disciplined about frequency of taking backups and am aware of the need to exit BOINC when doing mill-intensive stuff, but haven\'t a clue about what could have caused that specific error message. Ah well, some things are intended to remain a mystery, I suppose.
ID: 31384 · Report as offensive     Reply Quote
old_user479742

Send message
Joined: 29 Oct 07
Posts: 4
Credit: 39,104
RAC: 0
Message 31449 - Posted: 22 Nov 2007, 6:18:53 UTC - in response to Message 31353.  

Well, I had a lengthy reply composed, hit \'Post reply\' -- and next saw the download page. Mumble, mumble. (Let\'s hear it for the boinc BB!)

Without redoing the entire thing (as though I could):
You have 3GB of memory. Not usual. Likely 2*1GB plus 2*512MB. Same manufacturer? Same timings?

Vista: Where is your boinc folder? If in C:\\Program Files, that\'s a problem. Please put it anywhere else. D:\\boinc would be good. (I format my hard disks to give boinc its own Partition. This has advantages.)

You lost two Models with similar errors. \'22\' is a catch-all error and tells us nothing useful. Did you install the latest graphics drivers?

Is the box overclocked (not sure it can be done on a Dell)?

Do you run heavy-resource progs like games or video editing with boinc/CPDN active?

I forget whet else I wrote...


Hey,

I had another computation error.
This is what my screen says:

21/11/2007 1:49:36 AM||General prefs: using your defaults
21/11/2007 1:49:36 AM||Reading preferences override file
21/11/2007 1:49:36 AM||Preferences limit memory usage when active to 1534.57MB
21/11/2007 1:49:36 AM||Preferences limit memory usage when idle to 1534.57MB
21/11/2007 1:49:36 AM||Preferences limit disk usage to 27.77GB
21/11/2007 1:52:52 PM|climateprediction.net|Deferring communication for 1 min 0 sec
21/11/2007 1:52:52 PM|climateprediction.net|Reason: Unrecoverable error for result hadcm3iozn_cpnx_2000_80_125899030_2 (The device does not recognize the command. (0x16) - exit code 22 (0x16))
21/11/2007 1:52:54 PM|climateprediction.net|Computation for task hadcm3iozn_cpnx_2000_80_125899030_2 finished
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_1.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_2.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_3.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_4.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_5.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_6.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_7.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 1:52:54 PM|climateprediction.net|Output file hadcm3iozn_cpnx_2000_80_125899030_2_8.zip for task hadcm3iozn_cpnx_2000_80_125899030_2 absent
21/11/2007 6:39:36 PM||Resuming network activity
21/11/2007 6:39:36 PM|climateprediction.net|Sending scheduler request: Requested by user
21/11/2007 6:39:36 PM|climateprediction.net|Requesting 30240 seconds of new work, and reporting 1 completed tasks
21/11/2007 6:39:41 PM|climateprediction.net|Scheduler RPC succeeded [server version 509]
21/11/2007 6:39:44 PM|climateprediction.net|[file_xfer] Started download of file hadsm3fub_0332_005911804.zip
21/11/2007 6:39:45 PM|climateprediction.net|[file_xfer] Finished download of file hadsm3fub_0332_005911804.zip
21/11/2007 6:39:45 PM|climateprediction.net|[file_xfer] Throughput 16009 bytes/sec
21/11/2007 6:39:46 PM|climateprediction.net|Starting hadsm3fub_0332_005911804_6
21/11/2007 6:39:46 PM|climateprediction.net|Starting task hadsm3fub_0332_005911804_6 using hadsm3 version 506
21/11/2007 6:44:36 PM||Suspending network activity - user request
21/11/2007 10:01:31 PM||Suspending computation - user request
22/11/2007 1:00:17 AM||Resuming computation

I did what some of you guys said: I changed where my BOINC is located. Now it has my G drive that has some 70 GB of space of which climate prediction takes up to 1.3 GB. I normally leave it on all the time since I never close my pc. I set it to 50% cpu time usage to avoid over heating.

This type of error never happend with SETI or Rosetta. In fact I did some 30000 hours with rosetta straight never shuting down my pc and I never once had a problem or heating.

The stats of my pc are the following:
Manufacturer: Dell
Model: Dimension DXP061
Windows experience index rating: 5.5
Processor: Intel(R) Core(TM) 2 Quad CPU 2.40 GHZ 2.39 GHZ
Memory ram: 3070 MB (however physically I have 4 GB installed but Vista can only see 3 GB max).
System type: 32-bit operating system
Windows edition: Windows Vista Home Premium


So my questions are:
1) what causes these computation errors?
2) What can I do to fix them (since I have already allocated a disk for it)
3) Why does it never happen with others like Rosetta and SETI

any how I hope some one can help me...

if this continues i will most likely drop climate prediction and continue with rosetta and seti alone.

THANKS all!
ID: 31449 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31452 - Posted: 22 Nov 2007, 8:33:25 UTC
Last modified: 22 Nov 2007, 8:36:38 UTC

The most recent one was this:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6965692
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(

There as a similar one a day before:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6965694


A different one (16th Nov) was this:
exit code -1073741502 (0xc0000142)


The latter two crashes seemed to happen in the late evening, UK time. Do you recall what was happening on the PC at those times, perhaps a game or something else which uses 100% of CPU time for an extended period?


The 0xc0000142 happens when the PC is about to crash, and can\'t start any new processes. The very latest version of the Boinc Manager, 5.10.30 (in testing, not released) handles the C0000142 error better.


1) Possibly due to games or other stuff running at the same time, or out of date drivers for your motherboard graphics? There are other possible causes.

2) Try right-clicking and selecting \'exit\' on the boinc icon before playing games, doing anything else which uses 100% of CPU time such as video encoding, or shutting down the system. Also try disabling the Boinc screensaver, and see if you can find an update for the graphics drivers on your PC (you\'ll need to look on Dell\'s website).

3) Firstly, if one in a hundred Rosetta jobs was failing, you\'d probably not notice - because the climate project runs so much longer, a single failure is nore obvious. However, the climate model is more sensitive than other Boinc tasks about applications which use 100% of the CPU time at normal priority.

I\'d recommend that you have a read through the \'READMEs\' to see if there is anything which looks relevant (link in my signature).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31452 · Report as offensive     Reply Quote
old_user479742

Send message
Joined: 29 Oct 07
Posts: 4
Credit: 39,104
RAC: 0
Message 31463 - Posted: 23 Nov 2007, 19:56:04 UTC - in response to Message 31452.  

The most recent one was this:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6965692
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1
Model crash detected, will try to restart...
Sorry, too many model crashes! :-(

There as a similar one a day before:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6965694


A different one (16th Nov) was this:
exit code -1073741502 (0xc0000142)


The latter two crashes seemed to happen in the late evening, UK time. Do you recall what was happening on the PC at those times, perhaps a game or something else which uses 100% of CPU time for an extended period?


The 0xc0000142 happens when the PC is about to crash, and can\'t start any new processes. The very latest version of the Boinc Manager, 5.10.30 (in testing, not released) handles the C0000142 error better.


1) Possibly due to games or other stuff running at the same time, or out of date drivers for your motherboard graphics? There are other possible causes.

2) Try right-clicking and selecting \'exit\' on the boinc icon before playing games, doing anything else which uses 100% of CPU time such as video encoding, or shutting down the system. Also try disabling the Boinc screensaver, and see if you can find an update for the graphics drivers on your PC (you\'ll need to look on Dell\'s website).

3) Firstly, if one in a hundred Rosetta jobs was failing, you\'d probably not notice - because the climate project runs so much longer, a single failure is nore obvious. However, the climate model is more sensitive than other Boinc tasks about applications which use 100% of the CPU time at normal priority.

I\'d recommend that you have a read through the \'READMEs\' to see if there is anything which looks relevant (link in my signature).



SHIT a remember what it was. My brother plays counter strike in those time frames. thats why the cpu may be generating errors for BOINC. I will tell him to exit boinc when he plays his game.

Thanks a lot. I think now it should all be ok.

SOrry of the numerous long posts and my slowness at understanding!
ID: 31463 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31464 - Posted: 23 Nov 2007, 20:14:12 UTC


Glad that you\'ve managed to solve the mystery :-)

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31464 · Report as offensive     Reply Quote
ProfileSkip Da Shu
Avatar

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 15,308,708
RAC: 298
Message 31563 - Posted: 2 Dec 2007, 5:20:12 UTC
Last modified: 2 Dec 2007, 5:28:30 UTC

Well ya\'ll didn\'t make me feel all warm and fuzzy about solving my errors... let\'s take a run at it.

Today (after a few days of winding down WUs on the machine) I formated the HDD and installed Xubuntu v7.10 (64bit) on this machine. It\'s been running WinXP for some time with multiple projects on it. It\'s an AMD X2 4200+ with 2 x 256MB of PC4000 RAM. It\'s a dedicated number cruncher as is normally \"headless\".

I installed the v5 stdc++ libs (Gutsy comes with v6) required by several project apps (QMC, E&H, Lieden, WCG and perhaps CPDN). Use the package install to get the AMD64 version of BOINC v5.10.8 up and running as a daemon. I encountered these errors:

Sat 01 Dec 2007 06:24:50 PM CST|QMC@HOME|Reason: Unrecoverable error for result three_ad_anthracene.3996_0 (process exited with code 22 (0x16, -234))

Sat 01 Dec 2007 06:25:47 PM CST|climateprediction.net|Reason: Unrecoverable error for result hadsm3fub_0107_005913005_1 (process exited with code 22 (0x16, -234))

Sat 01 Dec 2007 06:25:51 PM CST|Einstein@Home|Reason: Unrecoverable error for result h1_0666.20_S5R2__265_S5R3a_1 (process exited with code 22 (0x16, -234))

Sat 01 Dec 2007 06:25:53 PM CST|World Community Grid|Reason: Unrecoverable error for result dddt0201k0629_ZINC06913243-0000_00_0 (process exited with code 22 (0x16, -234))


One thing that makes me think it\'s app dependent is that one of WCGs other apps runs fine.

Any thoughts?

PS: I see \"execv: No such file or directory\" in this returned result http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7013096
- da shu @ HeliOS,
"Free software is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer"
ID: 31563 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31564 - Posted: 2 Dec 2007, 7:47:20 UTC
Last modified: 2 Dec 2007, 7:49:29 UTC

Error 22 is a well know problem that I don\'t think has a solution.
You\'ll find many mentions of it both in this Number crunching section, and on the Questions and answers section of these boards.
It\'s not Linux specific.

Some thoughts on it are here.


Backups: Here
ID: 31564 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Where do all the errors come from?

©2024 cpdn.org