climateprediction.net (CPDN) home page
Thread 'misconfigured BOINC crashing work units'

Thread 'misconfigured BOINC crashing work units'

Questions and Answers : Macintosh : misconfigured BOINC crashing work units
Message board moderation

To post messages, you must log in.

AuthorMessage
mike armstrong

Send message
Joined: 20 Jun 05
Posts: 1
Credit: 2,463,117
RAC: 0
Message 35490 - Posted: 14 Nov 2008, 17:27:28 UTC

just recieved this e mail and do not understand. Way out of my league.


Help?

mike


climateprediction.net notification:

Dear mike armstrong
Your machine (host # 878606) described below appears to have a misconfigured BOINC
installation resulting in it crashing workunits. Would you please have a look at it?

Sincerely,
The climateprediction.net team


This is the content of our database:
ID: 878606
Created: 13 Jun 2008 8:29:05 UTC
Venue: home
Total credit: 111352.320967913
Average credit: 797.616802835029
Average update time: 14 Nov 2008 7:53:45 UTC
IP address: 192.168.1.100 (same the last 395 times)
Domain name: mike-armstrongs-imac.local
Local Time = UTC +0 hours
Number of CPUs: 2
CPU: GenuineIntel Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz [x86 Family 6 Model 15 Stepping 11]
FP ops/sec: 2055598762.29746
Int ops/sec: 5435676450.93856
memory bandwidth: 1000000000
Operating System: Darwin 9.5.0
Memory: 1024 MB
Cache: 976.56 KB
Swap Space: 71421.82 MB
Total Disk Space: 200.88 GB
Free Disk Space: 69.5 GB
Avg network bandwidth (upstream): 20872.354709 bytes/sec
Avg network bandwidth (downstream): 95866.167839 bytes/sec
Average turnaround: 0 days
Number of RPCs: 975
Last RPC: 14 Nov 2008 0:32:49 UTC
% of time client on: 95.725 %
% of time host connected: -100 %
% of time user active: 99.9233 %
# of results today: 2

For further information and assistance with climateprediction.net go to
ID: 35490 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 35494 - Posted: 14 Nov 2008, 18:50:32 UTC

If you look at the list of tasks for your Mac you\'ll see that 37 tasks have crashed immediately since 27th October. Click on any of the task ID links and then click on the \'+\' button after stderr out and you\'ll see that they\'ve all failed with the error Insufficient Memory/Stack Space Available!

That\'s the error discussed here. The problem seems to be restricted to HADCM3 version 6.* tasks on Mac, so your best option until the cause is identified would be to change your project preferences to avoid HADCM3 models.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 35494 · Report as offensive     Reply Quote
old_user539218

Send message
Joined: 29 Sep 08
Posts: 5
Credit: 4,330,352
RAC: 0
Message 35495 - Posted: 14 Nov 2008, 19:53:17 UTC - in response to Message 35494.  

I got this email today as well, however all my work units appear to be fine, scanning through the stderr_um files, there all 0 bytes.

I\'m not sure what to do?

Here\'s my clientID

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=924187
ID: 35495 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 35496 - Posted: 14 Nov 2008, 20:43:28 UTC - in response to Message 35495.  
Last modified: 14 Nov 2008, 20:48:54 UTC

I got this email today as well, however all my work units appear to be fine, scanning through the stderr_um files, there all 0 bytes.

I\'m not sure what to do?

Here\'s my clientID

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=924187


16 models downloaded to an 8 cpu computer does not sound too many especially when 8 of the models have credit granted. The email was supposed to have been sent to 80 troublesome hosts but my first reaction is that you shouldn\'t have been sent the email in repect of that computer. But it could easily be me misunderstanding. I\'ll try and find out.
Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 35496 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35497 - Posted: 14 Nov 2008, 20:56:57 UTC
Last modified: 14 Nov 2008, 21:00:24 UTC

Hi Toby

I\'ve looked at what your three computers are doing and they seem to be fine. I\'d just set the 8-core machine to No new tasks for CPDN in the Projects tab of Boinc Manager. It has 16 models which will keep it busy for a while! Another moderator and I think you\'ve received this email in error. (Mike who posted above did need the advice given.) We\'ve asked an administrator to check how the server selects members for this email.

Sorry about that - it does seem to be a mistake on the part of the project.
Cpdn news
ID: 35497 · Report as offensive     Reply Quote
old_user539218

Send message
Joined: 29 Sep 08
Posts: 5
Credit: 4,330,352
RAC: 0
Message 35498 - Posted: 14 Nov 2008, 21:06:26 UTC - in response to Message 35497.  

Hi Toby

I\'ve looked at what your three computers are doing and they seem to be fine. I\'d just set the 8-core machine to No new tasks for CPDN in the Projects tab of Boinc Manager. It has 16 models which will keep it busy for a while! Another moderator and I think you\'ve received this email in error. (Mike who posted above did need the advice given.) We\'ve asked an administrator to check how the server selects members for this email.

Sorry about that - it does seem to be a mistake on the part of the project.


I did reset the project because I was getting:

08-Nov-2008 19:21:42 [climateprediction.net] Task hadsm3mh_kl2v_006005487_7 exited with zero status but no \'finished\' file
08-Nov-2008 19:21:42 [climateprediction.net] If this happens repeatedly you may need to reset the project.

So the 1st 8 tasks won\'t ever run, can you re-assign them from your end?

Thanks for your help and no problem about the mistake, were all human.
ID: 35498 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35500 - Posted: 14 Nov 2008, 21:30:28 UTC

OK, that explains why you apparently have 16 tasks running. When the server updates that computer\'s tasks page, 8 of the models will be shown as aborted. I\'d come back to ask you how you\'d managed to get 16 models! You can see all this info for yourself by clicking on your name here on the forum then going through the links.

The zero status message is usually benign. If you get it again, don\'t reset the project please. That causes you to lose every model you\'re running. To tell the truth, this message from BOINC is a terrible nuisance. I think it should instead advise members who get the message repeatedly to ask for advice on the project forum.

If you go to the CPDN READMEs linked in my signature and in the collection about Crashes and Problems look at item #6 by MikeMars, you\'ll find info about this and all the other common things that can go wrong, and what we need to do to keep these long models going to the end.

The criterion for the email turns out to be more than 10 model downloads in a week. Anyway, if the email leads you to the project READMEs it will have been worthwhile.
Cpdn news
ID: 35500 · Report as offensive     Reply Quote
old_user539218

Send message
Joined: 29 Sep 08
Posts: 5
Credit: 4,330,352
RAC: 0
Message 35502 - Posted: 14 Nov 2008, 21:49:47 UTC - in response to Message 35500.  

snip


It got 8, then I had the nuisance message, since that computer is brand new, I though it was legit, so I did the reset. Then I got another 8, I assume that\'s the default for an 8core machine. I agree with you it implies that you have a serious problem, when it\'s probally noting to worry about.

I have read the readme\'s, so probally some added value :)

I suppose >10 in a week is a good indication of problems for 99% of people, not many people have 8core machine at home.

Thanks for all your help, I know where to look before being rash next time.
ID: 35502 · Report as offensive     Reply Quote

Questions and Answers : Macintosh : misconfigured BOINC crashing work units

©2024 cpdn.org