Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion ***
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 19 · Next
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It could be permissions. Compare the faulty one with a working one. You may have to chmod owners and/or read/write. But nice write up Digby. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
From a terminal window, go into the /scratch/wes/BOINC/projects/climateprediction.net/ directory and type ls -l hadam3* hadrm3* Do hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.zip hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip hadrm3p_eu_um_7.01_i686-pc-linux-gnu.zip show up? Les may be right, that these files aren't being written to that directory because of permission problems. Although one would think there would be some other error listed in stderr or in the boinc manager message log if that's the case. |
Send message Joined: 24 Aug 08 Posts: 7 Credit: 37,536,564 RAC: 163 |
cn96 wes:wes /scratch/wes/BOINC/projects 65> cd /scratch/wes/BOINC/projects/climateprediction.net/ cn96 wes:wes /scratch/wes/BOINC/projects/climateprediction.net 66> ls -l hadam3* hadrm3* -rwxr-xr-x 1 wes wes 2355035 Nov 11 01:33 hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip -rwxr-xr-x 1 wes wes 2664840 Nov 11 01:33 hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu -rwxr-xr-x 1 wes wes 75730 Nov 11 01:33 hadam3prm3pm2t_eu_data_7.01_i686-pc-linux-gnu.zip -rwxr-xr-x 1 wes wes 3771315 Nov 11 01:33 hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.zip -rwxr-xr-x 1 wes wes 2359300 Nov 11 01:33 hadrm3p_eu_um_7.01_i686-pc-linux-gnu.zip The files are there, and the permissions seem to be OK. Any other idea? |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
What happens if you go to /scratch/wes/BOINC/projects/climateprediction.net/ and try unzip hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip If unzipped manually will BOINC then run the tasks? (assuming unzipping manually works)? No idea if this is helpful but running out of ideas. |
Send message Joined: 24 Aug 08 Posts: 7 Credit: 37,536,564 RAC: 163 |
Yes, I can unzip them. I also tried unzipping them in /scratch/wes/BOINC/slots/27/ That also worked. I also tried to execute hadam3p_eu_um_7.01_i686-pc-linux-gnu but it requires some arguments, and I have no idea what they should be. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Could you tell us the numbers of the 2 computers in Question please? Also, have you re-booted after all of this? |
Send message Joined: 24 Aug 08 Posts: 7 Credit: 37,536,564 RAC: 163 |
1371629 and 1371491 They were last booted 180 days ago. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
cn96 wes:wes /scratch/wes/BOINC/projects 65> cd /scratch/wes/BOINC/projects/climateprediction.net/ Are any other projects currently running successfully on the two problem systems? If so, are there any differences in permissions or ownership of the project directories for those projects?? I notice that these files are owned by user:group wes:wes rather than by boinc:boinc as would be the case on a default install of BOINC from a Ubuntu repository. Given that the default install also goes in /var/lib/boinc-client, a non-respository install may explain these differences. Is the location and ownership of the various directories exactly the same on the machines you've got that are running CPDN successfully? If it is, I'm at a loss to explain what you're seeing. Good luck with solving this - Al. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Some more thoughts. A re-boot can sometimes fix problems. ********************* You may have a corrupted BOINC installation. ********************* You may have a mixed 32/64 bit installation, where you installed one version, and then upgraded to the other, but in the same location. In Windows at least, the 2 varieties should be installed in 2 different locations. ********************* You may have a mixed source installation, both Berkeley and repository. Again, these go in different locations. ********************* If/when you decide to re-install either/both the OS or BOINC, find the account xml files, (one per project), and save them. After the new install, put the file(s) back in the same location. Then the project(s) will know who you and your computer(s) are. For cpdn, the file is: account.climateprediction.net.xml |
Send message Joined: 24 Aug 08 Posts: 7 Credit: 37,536,564 RAC: 163 |
I had previously (3 times) tried reinstalling BOINC (boinc_7.2.42_x86_64-pc-linux-gnu.sh). It made no difference. There is no other installation of BOINC on these machines. This is the same BOINC installation that is running on other machines where I have no problem running CPDN. If it were the BOINC installation, I would expect other projects also to fail. However, the other projects I am running work fine: FIND@Home, Poem@Home, World Community Grid, malariacontrol.net . |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I'm afraid that I'm out of ideas. Sorry. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
If you haven't tried starting BOINC from the Terminal (I don't read that you have in this thread) a lot of "errors/warnings" that otherwise aren't visible will show up. In my case, the command is: cd "/home/bob/BOINC" && exec ./boincmgr $@ Just be careful you don't close the Terminal because that will terminate the BOINC manager. This procedure has always led to solutions to every problem I have had with BOINC or the projects. |
Send message Joined: 24 Aug 08 Posts: 7 Credit: 37,536,564 RAC: 163 |
Thanks for the suggestion, but it doesn't tell me anything new. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Interestingly, this computer http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1353448 is crashing everything with <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 2 (0x2, -254) </message> <stderr_txt> Process creation (../../projects/climateprediction.net/hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu) failed: Error -1, errno=2 execv: No such file or directory </stderr_txt> ]]> And again the users other Linux box seems to be completing the majority of tasks. Don't know if ths is related or noth? |
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
Good catch Dave... Thyme has contributed a lot over the years and his last post was http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=8096&nowrap=true#52767 He'd definitely like to know about this and perhaps, further down the road, he could share details on how he fixed the problem... Two questions: 1) Does the Boinc Server redistribute these crashed models? 2) Can the server flag 'duff' workstations so it does not keep mindlessly send new tasks to them? Cheers |
Send message Joined: 6 Apr 07 Posts: 2 Credit: 1,454,343 RAC: 13,420 |
Hi there :) Look at these 2 pcs, with kubuntu 15.04 64bit: 1377044 and 1379152 some tasks currently running and sending trickles, some other fail reporting: <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 127 (0x7f, -129) </message> <stderr_txt> ../../projects/climateprediction.net/hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory </stderr_txt> ]]> Maybe problems with initialization code? ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu hereafter: linux-gate.so.1 => (0xf77ad000) libpthread.so.0 => /lib32/libpthread.so.0 (0xf776d000) libdl.so.2 => /lib32/libdl.so.2 (0xf7768000) libstdc++.so.6 => /usr/lib32/libstdc++.so.6 (0xf7673000) libm.so.6 => /lib32/libm.so.6 (0xf7626000) libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf7609000) libc.so.6 => /lib32/libc.so.6 (0xf7450000) /lib/ld-linux.so.2 (0xf77ae000) |
Send message Joined: 6 Apr 07 Posts: 2 Credit: 1,454,343 RAC: 13,420 |
Answering to myself :) Didnt thoroughly check but these tasks where downloaded before I solve the 32bit libraries missing problem, the ones that trickle were obtained after the solution. Stupid me, have had a long absence on this project, sorry :) |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I answer to #1, yes it does. On the workunit page, it has a listing of "max # of error/total/success tasks" and that gives info on the number of tasks that will be sent out if one is not completed successfully to that point. I imagine if a work unit with all failed tasks is particularly important or potentially interesting, the scientist could create another work unit with the same parameters to see if one of those tasks could be completed. For #2, we wish. Perhaps a later version of the boinc server software may have some option like this? I don't know, but it sure would be nice, say, if a PC has successfully downloaded at least 100 tasks, and has continuous computation or download failures of some "reasonable" number, then the computer is flagged and put into a searchable database with the number of allowed downloads minused. One would have to be careful with this, however, since sometimes numerous duff workunits are accidentally generated creating lots of download or computation failures. Of course the best way for more Linux completions would be to have a 64 bit application. So very, very many of the failures are due to missing 32bit libraries on 64 bit computers that I don't doubt the completions per unit time would more than double if a 64 bit application was deployed. Ahhh...to dream. |
Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,223,760 RAC: 0 |
It's not terribly hard to configure Linux with 32 bit support (at least on Ubuntu and related distros), therefore able to run CPDN tasks. But apparently it's difficult enough to cause big problems running models on Linux. No secret nearly all Linux applications these days are 64 bit, which leaves me curious why CPDN Linux models cling to the older 32 bit design. Sure seems you are right, it's not at all using the power of Linux computers to their potential on behalf of CPDN. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
My understanding is that it would take weeks if not months to rewrite code for 64 bit and as the code is not open source it would have to be the met office who own it who did it. |
©2024 cpdn.org