climateprediction.net (CPDN) home page
Thread 'BOINC 6.2.6 affecting model stability?'

Thread 'BOINC 6.2.6 affecting model stability?'

Questions and Answers : Macintosh : BOINC 6.2.6 affecting model stability?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 34066 - Posted: 13 Jun 2008, 13:46:00 UTC

I\'ve recently installed BOINC 6.2.6 on my 1.83 GHz Core Duo MacBook. In the last two days, I\'ve had two models crash. Is the new version of BOINC to blame? If so, I\'ll revert back to 5.10.45.
ID: 34066 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34067 - Posted: 13 Jun 2008, 14:39:01 UTC

The XML read errors in 7481057 certainly look odd. Unless there\'s some permissions change, that should all run smoothly.

Experience is building with BOINC 6, but it\'s not being generally recommended here yet. Reversion is the best option.
ID: 34067 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 34983 - Posted: 14 Sep 2008, 10:17:40 UTC

This may be unrelated to my previous post, but I\'ve noticed a new wrinkle in model instability. If I run a CPDN model, my OS becomes unstable--keeps crashing when I attempt to log out, restart, or shut down (gray screen of death, need to restart by holding down the power button, etc.). After the manual restart, my CPDN model will read as a finished due to computation error. Since I stopped running climate models on 29 Aug, I have not experienced any OS crashes. I\'d like to continue running models but am not thrilled with the crashes. Any suggestions?

Thanks.
ID: 34983 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34991 - Posted: 14 Sep 2008, 19:27:00 UTC
Last modified: 14 Sep 2008, 19:27:21 UTC

There haven\'t been any reports of PC crashes caused by the science application. However, an intensive application like this could uncover problems:

- if the PC has some cooling problem that CPDN exacerbates by running continuously

- if there is a memory problem not explored by normal PC use but exposed by a \'large\' application like CPDN (I had a duff memory stick that caused problems only for CPDN; it was replaced under warranty - I was amazed)

- if disk space has run out (though your description of the problem stopping when CPDN isn\'t running eliminates that).

It might therefore be a good idea to run the PC\'s diagnostics or one of the stress tests - for a long time (a day). And check the PC\'s cooling: domestic PCs are particularly prone to problems because of carpets and a generally messier environment than an office.

Not much help, I know.

BTW BOINC 6.2.18 is now the recommended version on the CPDN download page.
ID: 34991 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 35012 - Posted: 16 Sep 2008, 18:51:07 UTC

I might add that this problem only showed up after I upgraded to BOINC 6. I had no problems using BOINC 5.
ID: 35012 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 35159 - Posted: 1 Oct 2008, 1:11:46 UTC

Update:

Installing the OS 10.5.5 update on my MacBook appears to have fixed something. I\'m 150 hours into a HADCM model and have had no problems. Everything appears to be stable again. No problems when shutting down or restarting the computer.
ID: 35159 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 35339 - Posted: 22 Oct 2008, 16:26:32 UTC - in response to Message 35159.  

I spoke much too soon. My HADCM model crashed last yesterday, just like all the others. I have not had a model come close to completing since I upgraded to BOINC 6.x.x. Time to downgrade and see if that solves anything.
ID: 35339 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 35346 - Posted: 23 Oct 2008, 11:17:54 UTC

My models are all crashing with \"code 22\" and \"shmget: No such file or directory.\" There\'s usually a \"No heartbeat from core client for x seconds\" thrown in for good measure.

I\'ve increased shared memory using the Terminal code given in the \"Fixing Error Code 6\" thread, but am at a loss to solve the \"shmget: No such file or directory\" code.

As for the \"No heartbeat\" code: my crashes seem to occur only when I restart or shut down my computer and will take down the entire system--I get the Apple version of the screen of death. However, I usually close out all applications prior to restarting, so it seems odd that the climate models would still be running after I\'ve closed BOINC.

I\'ve just downgraded back to 5.10.45, so I\'ll also see if that does anything to solve the problem. It seemed that the crashes started after I upgraded to BOINC 6.2.x. As for the one that crashed on 5.10.45: I jumped the gun on upgrading to 6.2.x. That model crashed when I upgraded before it finished.
ID: 35346 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35347 - Posted: 23 Oct 2008, 13:23:27 UTC

One of the features of BOINC 6, at least on Windows, is a preference for the \'service\' installation (which may be called something else on Mac). Closing down BOINC Manager does not close the service down: that needs a second action. This two-stage process can cause problems for CPDN users who want to do a backup, because some locks are still held by the service, even when BOINC Manager appears to have gone. (BOINC has no concept of \'backup\', so it doesn\'t really accommodate projects like CPDN with work units so long that a backup is prudent.)

So, perhaps the service/daemon/whatever needs to be shut down too. It may also be possible to change its properties so that it doesn\'t start automatically, if you think that the crashes occur on starting rather than stopping. I always suspend the model before re-booting, to prevent the restarting model having to compete with all the junk that runs on re-booting (Norton Anti-Virus, specifically).
ID: 35347 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35349 - Posted: 23 Oct 2008, 15:46:21 UTC
Last modified: 23 Oct 2008, 15:47:18 UTC

Quoting from Jorden\'s FAQ about BOINC6:

Q. How do I shut BOINC down now? Exiting BOINC Manager, BOINC and the applications keep on running! And how do I get it running afterwards?

A. There are a couple of ways to shut down the service.
The easiest method to shut down BOINC and get it running again is to go through BOINC Manager->Advanced view->Advanced->Shut down connected client. This will shut down the service. Next you go File->Exit to close down BOINC Manager.

To start BOINC back up, go Start->Programs->BOINC->BOINC Manager. This will start up BOINC Manager, which in turn starts the service.


Jim, you got the first of your current type of error messages when you still had BOINC5, so I don\'t think the problem is the BOINC version. I don\'t think it\'s the type of model either as you\'ve had the same error messages with both HADSMs and HADCMs.

While we try to find out what the error messages mean, I think you should start backing up the complete contents of your BOINC Data folder regularly. So you\'d be able to restore a crashed model and continue it from the restore point. There\'s a selection of backup methods in the README collection linked in my sig.
Cpdn news
ID: 35349 · Report as offensive     Reply Quote
Profileold_user216062

Send message
Joined: 29 Dec 06
Posts: 17
Credit: 379,624
RAC: 0
Message 35356 - Posted: 24 Oct 2008, 1:28:45 UTC - in response to Message 35349.  


While we try to find out what the error messages mean, I think you should start backing up the complete contents of your BOINC Data folder regularly. So you\'d be able to restore a crashed model and continue it from the restore point. There\'s a selection of backup methods in the README collection linked in my sig.


I use Time Machine to automatically back up my entire computer to an external hard drive. I\'m supposed to be able to restore any file. Will that work in this case? If so, should I restore the entire BOINC folder or just the climateprediction folder inside the BOINC folder?

Thanks.
ID: 35356 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35357 - Posted: 24 Oct 2008, 2:07:16 UTC

As several of us have posted time and again on all of the climate boards, it\'s necessary to save a copy of the COMPLETE BOINC folder and all of it\'s sub-folders, and to also restore ALL of these folders in the case of a model failure.
In the case of BOINC version 6, only the COMPLETE BOINC data folders need to be backed up and restored.

All of which is explained in the BACKUP section of these README posts.
Which are also linked to from my signature below.


Backups: Here
ID: 35357 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35360 - Posted: 24 Oct 2008, 9:23:30 UTC
Last modified: 24 Oct 2008, 9:26:24 UTC

Jim, I\'d advise choosing a backup method from the CPDN README collection as Les advises rather than relying on your automatic backup program. Auto backup programs don\'t exit from BOINC first. If BOINC is running when a backup\'s made (even if the model\'s suspended) the backup may not be restorable. Or it may restore but the tasks won\'t run.

So if I were you I\'d keep using the auto backup, but in addition choose a backup method from the README collection just for the BOINC (or BOINC Data) folder contents.

If you have BOINC6 installed as a service, Jorden explains in his FAQ how to exit fully:

Q. How do I shut BOINC down now? Exiting BOINC Manager, BOINC and the applications keep on running! And how do I get it running afterwards?
A. There are a couple of ways to shut down the service.
The easiest method to shut down BOINC and get it running again is to go through BOINC Manager->Advanced view->Advanced->Shut down connected client. This will shut down the service. Next you go File->Exit to close down BOINC Manager.

To start BOINC back up, go Start->Programs->BOINC->BOINC Manager. This will start up BOINC Manager, which in turn starts the service.

Cpdn news
ID: 35360 · Report as offensive     Reply Quote

Questions and Answers : Macintosh : BOINC 6.2.6 affecting model stability?

©2024 cpdn.org