Questions and Answers : Macintosh : how to restart after Leopard upgrade?
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
I suspended Project climateprediction.net before upgrading from Mac OS 10.4 to 10.5.7. Running the old BOINC, a dialog said to reinstall BOINC. I downloaded latest BOINC (6.6.29), copied the old preserved BOINC data folder to ~/Library/Application Support/BOINC Data/. The new BOINC doesn't seem to find the existing projects (5 long-running hadcm3), tries to download new ones. What is the right procedure to restart existing tasks after both an OS and BOINC upgrade? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There are several difficulties here: 1) The people who help out with problems all seem to run non-Mac computers, and know little about Macs. Mac people only seem to post when they have a problem, and don't look at other posts to see what has happened to other people. 2) You haven't said what version of BOINC you had before the upgrade. It's not possible to restore a version 5 backup to a version 6 BOINC install. 3) To get a version 5 backup to work, it's necessary to backup the ENTIRE BOINC structure, including the programs and the data, and with version 6, the ENTIRE BOINC data section. Also, when upgrading from V5 to V6, it's necessary to do so WITHOUT first uninstalling the old (V5) version. And the upgrade from V5 to V6 takes care of moving the data parts to the correct location. It's not necessary to manually copy data afterwards. So, many areas for something to have gone wrong. The best that I can suggest, is to read my post here on backups for Windows, and this faq on BOINC version 6, and work out how this translates to the Mac. Backups: Here |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi srooke Your Mac has done an impressive amount of work for CPDN! Upgrading your OS shouldn't adversely affect the situation as long as you have a correctly made backup. 1. Did you have BOINC v. 5.10.45 before the BOINC upgrade? If you can't remember, please tell us whether you still had the same BOINC version you installed when you first joined CPDN in April 2008. 2. Please go to your backup, open it up and tell us whether, probably at the top of the list of contents, you can see 3 folders/directories called Projects, Slots and Locale. 3. Have you got one Mac or two more or less identical ones? I'd like to know whether these are one or two computers. If you only have one and it's listed twice, in the menu on the left of this page, select Taking part in CPDN, then Your account, then Computers on this account, then click on Merge computers by name, then merge them. 4. I presume you can open up your current 6.6.29 BOINC Manager. Is anything listed in the Projects and Tasks tabs or is it completely empty? If you can answer those questions, preferably numbering your answers, we should be able to tell you which of Les's help posts and instructions you should use. Cpdn news |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
Hi srooke Thanks, Les and mo.v. After some reading, I realized the 6.6.29 install just wasn't seeing the old BOINC Data directory, which had been in ~/Library/Application Support, not under /Library -- possibly because I had been running Mac OS X Server 10.4 before upgrading to standard Mac OS 10.5. I uninstalled the new BOINC, copied the old BOINC Data directory to /Library/Application Support, and reinstalled. This appeared to work, changing all ownerships to boinc_manager, and BOINC recognized climateprediction.net on startup, with the 5 existing tasks. I clicked Resume, and 5 processors went to 100% cpu ... however, they all shut down in the next minute. Messages showed each one "finished" successively, e.g. "Restarting task hadcm3istd_crjl_1920_160_06019829_4 using hadcm3i version 604"... "Computation for task hadcm3istd_crjl_1920_160_06019829_4 finished" "Output file hadcm3istd_crjl_1920_160_06019829_4_1.zip for task hadcm3istd_crjl_1920_160_06019829_4 absent" (repeated for many files) It then appeared to request new tasks, but reported five "Message from server: No work available for the applications you have selected. Please check your settings on the web site." Before the upgrade I was running: Mac OS X Server 10.4.11, BOINCManager 5.10.45 After upgrade: Mac OS X 10.5.7, BOINCManager 6.6.29 2. Please go to your backup, open it up and tell us whether, probably at the top of the list of contents, you can see 3 folders/directories called Projects, Slots and Locale. The directories are there with files and subdirectories, but all lowercase (projects, slots, locale). Only the one Mac. Is there a place I can mail the whole Messages file without cluttering this post? The only suspicious thing I can see before the five tasks finishing abruptly is "Can't load library libcudart". |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hello again I'm consulting another moderator, Thyme Lawn, before one of us suggests what you should do in what order. That's because there are two problems - your BOINC installation and a specific known Mac memory issue. The libcudart message may well indicate a third issue that needs to be solved. No need to post any of your BOINC Manager messages. We can see why the models crashed on each model's web page. So don't do anything at the moment; just wait please. Cpdn news |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The standard Mac OS installation sets up a very limited amount of shared memory. Configuring Shared Memory on Mac OS X gives very clear and easy to follow instructions for increasing the amount available (BOINC uses shared memory for a lot of its inter-process communications). There's no need to worry about the libcudart error; see Apple Support Discussions. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thank you for the rapid response, Thyme Lawn. Hi again Srooke When you've done that please post again to say so. I'll then explain how you can get your BOINC installed properly. In the meantime make sure that in the Projects tab CPDN is set to No New Tasks. And please don't delete your earlier backup because you'll need it again. Cpdn news |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
Thank you for the rapid response, Thyme Lawn. Done (/etc/sysctl.conf) - I had replaced the file after the OS upgrade, but neglected to reboot. shmmax: 16777216, shmall: 4096, shmseg: 32, shmmni: 128. Projects tab set to "No new tasks". Earlier backup intact and awaiting instructions - thanks! |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
The five older models plus a new HadCM all crashed as we can see on your tasks web page. In the case of all 6 models you'll find the error Insufficient Memory/Stack Space Available! Now that you've sorted out the stack space problem that shouldn't happen again. Even after models have crashed it's still possible to restore them, if necessary again and again, and complete them. The fact that you have projects, slots and locale (you're right, all lower case) all in the BOINC Data directory means that you have an incorrect installation with BOINC 5 and BOINC 6 mixed up. In BOINC 6, projects and slots should be in the BOINC Data directory but locale should be be in the BOINC directory. In BOINC 5 everything is together in a single directory. To put this right you need to revert to BOINC 5.10.45, get your models working again on that, and then upgrade to BOINC 6. When we upgrade from BOINC 5 to BOINC 6, BOINC itself migrates all the files from one directory to two. You need to * Completely uninstall your current BOINC 6 * Restore your backup, preferably to where it was before * Then download BOINC 5.10.45 and install it on top of the restored backup * 5.10.45 for Mac is no longer on the BOINC download page or even on the All versions page, so you'll need to get it from the Index of downloads: http://boinc.berkeley.edu/dl/?C=M;O=Dclimate_change_screensaver.exe It's dated 5 March 2008. Two Mac versions are listed. I expect you'll need the one called boinc_5.10.45_macOSX_universal.zip. The other version called boinc_5.10.45_universal-apple-darwin.zip is less than 1Mb and looks to me too small for a full BOINC download. * Then start up BOINC and your models. Assuming that your backup was made correctly, ie after completely closing down BOINC, your models should run. * Upgrade to BOINC 6 whenever you want. Before you upgrade, completely close down BOINC and make a new backup. This isn't strictly necessary; it's a precaution. Leave your BOINC 5 intact. For the upgrade, BOINC needs access to all the BOINC 5 directories and files. * Upgrade to BOINC 6. Get it from the main BOINC index page. The CPDN server won't forget that your models have previously crashed. They will remain classified as crashed on their web pages. But they should complete without problems and will be used by the researchers. Cpdn news |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
The five older models plus a new HadCM all crashed ... [snipped] Uninstalled 6.6.29. Made sure old ~/Library/Application Suport/BOINC Data was still in place and intact. Downloaded and installed 5.10.45. On startup it wanted to connect to a new project, so I canceled, uninstalled 5.10.45, copied the ~/Library version to /Library, reinstalled 5.10.45. This time it found the existing tasks. Resumed project, saw cpu meters spike then drop, examined Messages. Right after the benchmark results: Restarting task hadcm3istd_crjl_1920_160_06019829_4 using hadcm3i version 604 [repeated for all five tasks] Computation for task hadcm3istd_crjl_1920_160_06019829_4 finished [bunch of .zip files absent messages] Message from server: Completed result hadcm3istd_crjl_1920_160_06019829_4 refused: result already reported as error Belatedly remembered to click No New Tasks after it started but did not finish the five new downloads. $ sysctl -A | grep shm kern.sysv.shmall: 4096 kern.sysv.shmseg: 32 kern.sysv.shmmni: 128 kern.sysv.shmmin: 1 kern.sysv.shmmax: 16777216 I looked on my tasks webpage but there are no entries for 19 May yet, to see what the current cause is. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Unfortunately in the case of models that have previously crashed, the stderr out report on the models' web pages remain fixed, only showing the causes of the initial crash. New stderr out messages are not added for subsequent events and crashes. The error code doesn't update for subsequent crashes either. This is the way BOINC works and it's a nuisance as there are no clues about the causes of subsequent crashes. I think I'm right about that. I would be very happy to be proved wrong. Are the newly downloaded models running properly? If they are, at least it shows that you successfully corrected the stack space problem. Cpdn news |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
Nothing appears to be running, but I may have clicked No New Tasks before they fully started up. Tasks shows 2 (out of expected 5) but with progress = 100%; Suspend button is available on both of these, but there doesn't appear to be anything to suspend. The file stdoutdae.txt in the Data directory lists, in the file's chronological order: [---] Can't rename client_state_next.xml to client_state.xml; check file and directory permissions [---] rename client_state_next.xml to client_state.xml returned error 2: No such file or directory [---] [error] Couldn't write state file: system rename The BOINC Data directory has permissions: $ ls -ld BOINC\ Data/ drwxrwx--x 34 boinc_master boinc_master 1156 May 19 13:49 BOINC Data/ and contains: $ ls -lt client_state* -rw-rw---- 1 boinc_master boinc_master 74518 May 19 13:49 client_state.xml -rw-rw---- 1 boinc_master boinc_master 74518 May 19 13:43 client_state_prev.xml If this is becoming too much of a time drain, let me know and I'll just scrap the five partial results and start fresh with 6.6.29. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The file stdoutdae.txt in the Data directory lists, in the file's chronological order: That definitely looks like a permissions problem. Check that the BOINC core client is running under the boinc_master account. If it isn't the process owner must be a member of the boinc_master group. Also check that every directory on the path to the BOINC data directory has global traverse permission (drwxrwx--x, with only the last 'x' being significant). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
It turns out launchd was firing up BOINCManager owned by me, not boinc_master. I managed to change boinc_master's gid to my own using "dscl . -change /Users/boinc_master PrimaryGroupID 31 20" (since Apple no longer uses /etc/group and the gui stuff doesn't show hidden accounts). This resolved the permissions problems. stdoutdae.txt shows starting work on the old tasks, creating shared memory regions, climate model starting, ... then some "Cleaning up from the run...", "Detaching shared memory...", "Model crash detected, will try to restart...", and finally some "Sorry, too many model crashes! :-(" Recall, this was after reinstalling BOINC 5.10.45 in an attempt to resurrect the five suspended tasks prior to upgrading to 6.6.29. BOINC has been down for 6 days after my OS upgrade, and I've been taking a lot of your time. Maybe I should just abandon the attempt to resurrect the five suspended tasks from 5.10.45 and start fresh with 6.6.29? |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Now that you've sorted out the permissions problem you could delete the entire contents of your 5.10.45 BOINC directory, restore your old backup into it again, install 5.10.45 again on top of it and let the models have one last attempt to run. You're probably so adept at restoring stuff now that you'll be able to do that in no time at all. If the models run they'll owe their lives to you. If they don't, well, you'll never have to regret not doing your best! Cpdn news |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
Now that you've sorted out the permissions problem you could delete the entire contents of your 5.10.45 BOINC directory, restore your old backup into it again, install 5.10.45 again on top of it and let the models have one last attempt to run. You're probably so adept at restoring stuff now that you'll be able to do that in no time at all. If the models run they'll owe their lives to you. If they don't, well, you'll never have to regret not doing your best! Did exactly that with same result, decided to abandon attempt. I hope the lone critical miraculous insight that could save civilization wasn't lurking in the five abandoned tasks. (Just finished Lovelock's Vanishing Face of Gaia...) Uninstalled 5.10.45, removed BOINC Data, fresh install of 6.6.29, connect to project. Nothing suspicious in stdoutdae.txt except several: "[error] hadcm3istd_csyl_1920_160_06021665_1: negative FLOPs left -2.000000". Then the first "Message from server: No work sent" + "Message from server: No work available for the applications you have selected. Please check your settings on the web site." "Computation for task hadcm3istd_csyk_1920_160_06021664_7 finished" + many "Output file hadcm3istd_csyk_1920_160_06021664_7_1.zip for task hadcm3istd_csyk_1920_160_06021664_7 absent". Then apparently cycling through multiple "Message from server: No work available for the applications you have selected. Please check your settings on the web site." I reviewed all Account preferences, only increasing gfx from 50% to 75% while I was there. Tasks page shows "Compute error"s - something related to the negative FLOPS? I'm suspending until your next comment. [Note: after the OS upgrade I lost the Xcode compilers & libraries since my satellite internet connection has a max bandwidth quota and Xcode was supposed to be 3GB. I get the current Xcode on dvd today, will install, and see if that affects boinc.] No!!! Forgot to reset boinc_master group id. Uninstalled, reinstalled, this time using dscl to change gid to my own group before clicking Next to attach to project, with "chown -R 31:20 BOINC\ Data" since there are already some files. No indication of permission problem in the stdoutdae.txt above, but there must have been a problem. Sigh. Error: No work available for the applications you have selected. Too soon after previous attempt(s)? |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
Just to bring this thread up to date for anyone else having problems upgrading from Mac OS X 10.4 (tiger) to 10.5 (leopard): The attempts to revive five hadcm3 tasks suspended before the OS upgrade (running under boinc 5.10.45), described above, all failed. I abandoned the restart attempts, uninstalled boinc, deleted the old data directory, and did a fresh install of BOINC 6.6.29. The first startup failed. I then spotted other posts about a Fortran bug in 10.5 (leopard) for hadcm3, which would account for the failures, since I had only hadcm3 checked in my web preferences. I unchecked hadcm3 and checked all the other tasks that appeared to run on the Mac. This time three out of five task slots were filled with successful hadsm3 tasks, the other two failing with compute errors, and this continued for two days. Next I had to reboot, so I first suspended project, quit boinc, then rebooted. On resuming project, only one of the hadsm3's survived, the other two showing compute error - so on my system there is also some bug involving suspending / resuming hadsm3. Just now, with only one climateprediction.net task (a hadsm3) running out of the expected five (on an 8-core MacPro), I updated project, but messages from server say no work is available for HadSM3 Slab Model, Mid-Holocene, or HADAM3P / reached daily quota of 8 results. Climateprediction.net has been mostly down on my system for two weeks after Leopard upgrade. I sympathize with the shortage of programmers for debugging Mac OS specific, likely Fortran bugs. I do have many years Fortran experience in UNIX, with gdb, gcc, g95, mpif77, and Absoft compilers on this Mac, if there's anything I can do to help. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
This time three out of five task slots were filled with successful hadsm3 tasks, the other two failing with compute errors, and this continued for two days. Next I had to reboot, so I first suspended project, quit boinc, then rebooted. On resuming project, only one of the hadsm3's survived, the other two showing compute error - so on my system there is also some bug involving suspending / resuming hadsm3. The two tasks which ran for some time were HADAM3P (hadam3p_nefj_1983_2_006167193_2 and hadam3p_ndti_1970_2_006166400_2), not HADSM3. The stderr output for those tasks indicate that the worker program was being run with the wrong version of the Fortran runtime library: dyld: Library not loaded: libifcoremt.dylib You've only been allocated HADAM3P tasks since then and they've all failed immediately with the same error (I've no idea why the first pair ran for about 21 hours). At the moment your only options are:
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 11 Apr 08 Posts: 9 Credit: 1,704,991 RAC: 0 |
I selected only HADAM3P and HADCM3 about 4 days ago with project reset, still not getting anything running (combination of errors, daily quota exceeded / no tasks available). Yesterday (31 May) I uninstalled BOINC, deleting the data directory and reinstalled. Finally, this morning, I have 7 HADAM3P tasks running happily, for the first time since upgrading Mac OS X to Leopard. Thanks for all the help! |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Well done! Cpdn news |
©2024 cpdn.org