Message boards : Number crunching : Premature finish of hadam3p tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Hi Computer Id :- 1142892 My last 7 hadam3p tasks have all 'error whilst computing' virtually immediately after starting. Is there a general problem or should I simply try starting new tasks? (Or is the problem at my end?) Regards David |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I see that on your other computer which is also a Mac there is not problem and that up till 27th January you were completing these models ok. This seems to rule out most of the ideas I might have had from perusing these fora. Has anything changed on the machine in question? If it is a general problem I guess you will see it when the other machine next finishes a task. All I can give you is that if it is a general problem it probably only affects Macs as my linux machine doesn't have any problem and has just started running two new hadam3p tasks.Perhaps anyone running windows could confirm that those machines are not affected. My guess is that they won't be or with the number of windows machines out there someone would have posted by now. If it is a general Mac problem, I guess we will know within 24 hours. Dave |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
Did you upgrade or reinstall BOINC? There is a bug in BOINC/CPDN for Mac that means an upgrade changes the file permissions so that every subsequent model for any application type that has aleady been run will crash. The solution is to reset the project, which clears out all the downloaded applications so that they can be re-downloaded with the correct permissions. A fix has been developed for the CPDN side of things but has not quite been released yet. [Edit: Doesn't look like it: comparing a success and a failure (eventually) has 6.10.58 for both. Virus checker?] |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Haven't upgraded BOINC - both my Mac's use 6.10.58. Also no virus checker on either machine. David |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's a sticky post at the top of the Macintosh section of this board about this problem, but the increased security (sandboxing) which caused it, was only supposed to occur with BOINC 6.12.* It might help to try a Project Reset. If it doesn't, then you'll have to stop trying to run the Regional models until a new version of them for the Mac is released. Backups: Here |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I know I don't use a mac but not sure I understand this one as the machine had been running hadam3p models fine up till 27th January. But then I don't understand a lot of the foibles of my own Linux box either so maybe that doesn't mean anything. (-: |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
I can't think of anything at my end to account for this - no changes of any kind to this Mac. I was running Malaria Control stuff for a few weeks until CPDN got more stuff available. However, I didn't get an error status in the BOINC manager messages, as seen in the following, the task just stopped. climateprediction.net Fri Feb 24 11:09:44 2012 Starting hadam3p_saf_1qr6_1977_1_006991306_ climateprediction.net Fri Feb 24 11:13:05 2012 Starting task hadam3p_saf_1qr6_1977_1_006991306_2 using hadam3p_saf version 609 climateprediction.net Fri Feb 24 11:13:05 201 Computation for task hadam3p_saf_1qr6_1977_1_006991306_2 finished climateprediction.net Fri Feb 24 11:13:05 2012 Output file hadam3p_saf_1qr6_1977_1_006991306_2_1.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent. . . . climateprediction.net Fri Feb 24 11:13:05 2012 Output file file hadam3p_saf_1qr6_1977_1_006991306_2_12.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent climateprediction.net Fri Feb 24 11:13:05 2012 Output file hadam3p_saf_1qr6_1977_1_006991306_2_13.zip for task hadam3p_saf_1qr6_1977_1_006991306_2 absent The error message was in the task information in my account data. I might be wrong but I think that this problem is occurring with other people as well - see Workunits 7967420/ 7967421/ 7083006 as examples. Regards David |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
A more detailed error message is in the stderr section of the model results page (if you're prepared to wait - there's an absurd delay at the moment): <stderr_txt> This shows that the two atmosphere and ocean model processes cannot even start properly, which is what happens with the Mac permissions bug described earlier. So Les's advice is good advice: try a project reset - you've got nothing to lose as nothing is running. As far as the other crashes are concerned, the machines in 7967420 are also Macs - and their error messages are the same as yours. If you look at the machines then you will notice that their 'average credit' is zero - i.e. they've been serially trashing as many models as they can download for some time. This is a situation where (much-maligned) credits ought to be useful - it does surprise me that so many Mac users haven't noticed over a long period of time that their machines are producing absolutely nothing. The bug isn't the fault of the users and volunteer computing should require as little effort as possible, but it's never been like that in my experience - checking on models is needed and an occasional visit to the message boards wouldn't hurt either. PS I should say that I have a Mac and it runs fine, having done the required reset after each upgrade/re-install ... |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Thanks guys, I've done as you suggested and reset the project and the latest downloads are running OK. It just seemed bizarre that with no changes to my system, a bug that surfaced with a new version of BOINC should suddenly appear to be present in an older version!! Especially since my other Mac, which has the same configuration, trundles along quite happily. As you say Iain, it's really surprising that the other failures haven't been spotted before. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Surprising to me too. I had followed some of the posts re this problem but had not heard of it appearing out of the blue on a machine that had had no changes made to it. Some change in the hadam3p tasks? But again, why one machine and not the other? I suspect the answer will end up being, "42." |
©2024 cpdn.org