Questions and Answers : Macintosh : Shared Memory, other thread locked and pinned
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Well, I have a new Mac Pro with 8 CPU and tried to run CPDN and the models die. I do have the \"fix\" in place to up the shared memory. I have 12G Main memory, 2.73 TB free disk space (though it is on a HW RAID 5 array ...) Memory says it is healthy (I did find one stick reporting correctable ECC errors - it is out now) ... Not sure what else to test. Any thoughts? I can post reports if you tell me what you want to see ... |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Paul, Absent of a proper solution, you may find that the coupled models run correctly (based on adempster). These are selectable in the climateprediction.net set of preferences in your account. Iain [Edit: following your post, I\'ve passed on a comment noting that this seems to be a general Mac problem with slabs.] |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
It took me longer than it should... but, it does look like you are correct. The other models may have all been Slabs (or I did not have the \"fix\" in place. BUt, I have a HadCM3 model with an hour on the clock. Which is a step ahead because the other models never got started. Of course, it also says 1089:29:47 to go ... We will see how it goes. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Just as a follow on note, it does appear that all the other models were \"Slabs\" and I now have a coupled model on the Mac Pro and it has 67 hours of runtime on its clock ... so, that looks decent. A side observation, not sure what it MIGHT mean, and not sure you want me to \"blow through\" a bunch of models to test (but will if you ask), my memory usage is LOW, I mean, only a couple Gig out of the 16 is being used ... my recollection was that earlier that HUGE chunks of memory were allocated and locked on earlier runs. I only had 8 G then, but I recall it ALL being consumed. So, the \"shared memory\" may be some OTHER memory allocation issue on the Mac Pro (Intel) machines with the slab models. Again, it is YOUR models ... hate to blow them up just for laughs... One OTHER note, though I did have ECC errors they were all corrected so not sure if that is relevant to the discussion or not, but, for completeness, thought I would mention it. It just seems to me, if you want Mac machines to participate and run the slab models, in that Iain points out that this seems to be a \'standard\" problem ... And I do have the 16M shared memory allocation ... should you wish to check my numbers I will be glad to post them too ... |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Hi Paul, I don\'t know anything about Macs, so this is just a \'generic\' suggestion (and probably won\'t help): If you increase the shared memory segments further, does that make any difference? i.e., perhaps 32MB shared memory and 64 segments? Iain had a browse through some other Macs on the project, and he found that the earlier operating system (Darwin 9.1) looked OK with slabs, but many of the machines running Leopard (9.2) seemed to be burning through models. So it might be that Leopard and the Slab models are incompatible? I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Not sure I know anything about Macs either ... :) It could be ... I know a number of things are different in Leopard and that is what I am running on both Macs... of course, there is no model for the G5 ... so, I will have to do other things over there ... :) I am in the middle of a days long test so, can\'t stop to reboot ... but I could try that ... increase the numbers again ... with 16G of ram it is not like I will run out of it soon ... I am only using a little bit of it at the moment ... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I wonder if the \'debug\' file might help. cc_config Apart from the essential lines, there are two that may be of interest: <app_msg_receive> |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
<app_msg_receive> That just shows the messages passing between the CPDN controller process and the BOINC core client. <app_msg_receive> displays one every second for running tasks (sent to update the progress displayed by BOINC Manager). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Well, you guys have some time to figure it out... :) I have one model running and I think I want to wait till it is done before I do much of anything ... :) I would hate to blow up a perfectly good running model. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Tolu is going to have a look at the slabs-on-Leopard issue next week to see if he can find out what is going wrong there. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
cool! It could be as simple as permissions/security is different ... I THINK there is also some subtle differences in disk layout ... though I cannot be sure. I *DO* know that even the change in the version of Mac Pro I have, for example, broke the TechTools 4.6.1 toolkit so it will not boot off of the CD/DVD ... they asked me about the memory speed ... not sure what that might have to do with it ... but ... Of course, I did leave the debris of half a dozen models in my account for my Mac Pro so he can look at the log there ... I am still in the middle of a long disk test so if it requires a re-boot I can\'t help for a couple days ... but, should he want to test something easy I can try it ... He can PM me, or if he is a packrat like me he may even have my e-mail address from days of old (it has not changed) ... or, if he posts here ... I am \"watching\" this thread ... |
Send message Joined: 5 Jan 06 Posts: 4 Credit: 7,655,256 RAC: 0 |
Hi, I got the same problems on a new Mac Pro (8 cores, 4GB RAM). Slab models don\'t start by the message of \"Insufficient Memory/Stack Space Available!\" The models are: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403801 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403823 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7403827 HadCM3 models are running fine. What I\'ve done after seeing the models crash: -I\'ve stopped all projects on this computer. (Einstein@Home and Sudoku@Home additionally) -Finished BOINC. -Restarted BOINC -Continued climateprediction -Wait until three HadCM3 models are present -Set climateprediction to \"no new work\" -Continued the other projects -Set in preferences in my account to prefer HadCM3 models. (Maybe it\'s better to change the preferences before continuing climateprediction) Hope the Slab bug will be fixed soon. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Hmmm, mine don\'t even have that much in the std error listed ... To *ME* this looks more and more like a permissions problem ... In your case it may be something with the file system. Mine, who knows as there are no messages in the returned data file. Are you running Tiger or Leopard? My Mac Pro is: Model Name: Mac Pro Model Identifier: MacPro3,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 3.2 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 16 GB Bus Speed: 1.6 GHz Boot ROM Version: MP31.006C.B05 SMC Version: 1.25f4 The memory sticks are all from the same vendor and look to be pretty close in batch # so should be close running mates ... no errors reported so running well ... I am running Leopard with all the latest patches ... |
Send message Joined: 5 Jan 06 Posts: 4 Credit: 7,655,256 RAC: 0 |
Mine is a: Model Name: Mac Pro Model Identifier: MacPro3,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 2.8 GHz Number Of Processors: 2 Total Number Of Cores: 8 L2 Cache (per processor): 12 MB Memory: 4 GB Bus Speed: 1.6 GHz Boot ROM Version: MP31.006C.B05 SMC Version: 1.25f4 Leopard(Mac OS X 10.5.2 (9C7010)), Darwin 9.2.2 The Activity Monitor shows 2.30 GB used (1.66GB free). But the other projects have sometimes the status \"Waiting for shared memory\". So maybe *@home applications are not able to deal with more then 2GB of memory or Mac OSX has a bug. hardy |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
maybe *@home applications are not able to deal with more then 2GB of memory or Mac OSX has a bug. Now there is a happy thought ... It could be the size though ... I am running TechTool Pro 4.6.1 on a 1.5 TB disk and it is curently counting NEGATIVE blocks at the far end ... someone used a unsigned long to store the block count (it will always be 0 or more) and read it out as a signed int ... One of the reasons I do not like C as a programming language. Loose typing is what gets you bugs like this ... WIth stronger typing the compiler tells you about errors like this so you have to expecially use a type-case to convert from one type to another and then it is on the programmer\'s head if he makes the wrong choices ... but, when you have to do it explicitly and cannot get away with it ... well, thinking about changing from one type to another always made me think of the boundary conditions ... |
Send message Joined: 5 Jan 06 Posts: 4 Credit: 7,655,256 RAC: 0 |
What me took to the suggestion to Mac OSX limit is \"Toast\". That program always crashed on Tiger on my macmini when it reached a virtual memory size of 2GB while creating DL-DVDs (direct or via image). Bug reported but no usefull response though. So I suspect a similiar problem on the Slab models. hardy |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... The shared memory situation can be improved by using the spyhill patch (see the error-code-six sticky at the top of this forum). But it appears that it probably won\'t resolve the slab model problem which you are experiencing, all it will do is increase the number of Boinc tasks that you can run simultaneously. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
The shared memory situation can be improved by using the spyhill patch (see the error-code-six sticky at the top of this forum). I did not have an issue with any other project running on the 8+ G memory before the patch, or after, ... I *DO* recall a massive allocation of memory though I am loath to try it again and to \"burn through\" models just for the heck of it ... :) So, we wait in patience ... :) Hey, my disk drive is now testing block number -2,009,510,784 and counting down to -13,64,672,512 |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Paul, This has now been fixed with new release of Mac OS X Intel 5.05 - applications. I assume if you now try to download a slab for your Mac, you\'ll get the new application version. Iain |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Iain, Well, even with a resource share of 5,000 I could not induce it to pull another model ... On an 8 CPU system ... I don\'t know how to fiddle the parameters to get it to pull another model ... Maybe if I suspend all other projects? {edit}Dng nab it .. pulled TWO !!!!!{/edit} {edit 2}It looks like the two of them are running ... both have 30 min on their clocks ... {/edit 2} |
©2024 cpdn.org