climateprediction.net home page
*** Running 32bit CPDN from 64bit Linux - Discussion ***

*** Running 32bit CPDN from 64bit Linux - Discussion ***

Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion ***
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 19 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52843 - Posted: 11 Nov 2015, 14:28:55 UTC
Last modified: 11 Nov 2015, 14:29:56 UTC

It could be permissions.
Compare the faulty one with a working one. You may have to chmod owners and/or read/write.

But nice write up Digby.
ID: 52843 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 52844 - Posted: 11 Nov 2015, 15:47:49 UTC

From a terminal window, go into the

/scratch/wes/BOINC/projects/climateprediction.net/

directory and type

ls -l hadam3* hadrm3*

Do

hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.zip
hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip
hadrm3p_eu_um_7.01_i686-pc-linux-gnu.zip

show up?

Les may be right, that these files aren't being written to that directory because of permission problems. Although one would think there would be some other error listed in stderr or in the boinc manager message log if that's the case.
ID: 52844 · Report as offensive     Reply Quote
Wes

Send message
Joined: 24 Aug 08
Posts: 7
Credit: 37,536,564
RAC: 163
Message 52895 - Posted: 16 Nov 2015, 11:06:01 UTC - in response to Message 52844.  

cn96 wes:wes /scratch/wes/BOINC/projects 65> cd /scratch/wes/BOINC/projects/climateprediction.net/
cn96 wes:wes /scratch/wes/BOINC/projects/climateprediction.net 66> ls -l hadam3* hadrm3*
-rwxr-xr-x 1 wes wes 2355035 Nov 11 01:33 hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 2664840 Nov 11 01:33 hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu
-rwxr-xr-x 1 wes wes 75730 Nov 11 01:33 hadam3prm3pm2t_eu_data_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 3771315 Nov 11 01:33 hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 2359300 Nov 11 01:33 hadrm3p_eu_um_7.01_i686-pc-linux-gnu.zip


The files are there, and the permissions seem to be OK. Any other idea?
ID: 52895 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,663,251
RAC: 14,512
Message 52896 - Posted: 16 Nov 2015, 11:23:34 UTC - in response to Message 52895.  

What happens if you go to /scratch/wes/BOINC/projects/climateprediction.net/ and try

unzip hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip

If unzipped manually will BOINC then run the tasks? (assuming unzipping manually works)?

No idea if this is helpful but running out of ideas.
ID: 52896 · Report as offensive     Reply Quote
Wes

Send message
Joined: 24 Aug 08
Posts: 7
Credit: 37,536,564
RAC: 163
Message 52897 - Posted: 16 Nov 2015, 15:02:42 UTC - in response to Message 52896.  

Yes, I can unzip them. I also tried unzipping them in
/scratch/wes/BOINC/slots/27/
That also worked. I also tried to execute hadam3p_eu_um_7.01_i686-pc-linux-gnu but it requires some arguments, and I have no idea what they should be.
ID: 52897 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52898 - Posted: 16 Nov 2015, 21:24:42 UTC

Could you tell us the numbers of the 2 computers in Question please?
Also, have you re-booted after all of this?


ID: 52898 · Report as offensive     Reply Quote
Wes

Send message
Joined: 24 Aug 08
Posts: 7
Credit: 37,536,564
RAC: 163
Message 52899 - Posted: 17 Nov 2015, 10:48:49 UTC - in response to Message 52898.  

1371629 and 1371491
They were last booted 180 days ago.
ID: 52899 · Report as offensive     Reply Quote
alanb1951

Send message
Joined: 31 Aug 04
Posts: 36
Credit: 9,581,380
RAC: 3,853
Message 52902 - Posted: 17 Nov 2015, 21:44:45 UTC - in response to Message 52895.  

cn96 wes:wes /scratch/wes/BOINC/projects 65> cd /scratch/wes/BOINC/projects/climateprediction.net/
cn96 wes:wes /scratch/wes/BOINC/projects/climateprediction.net 66> ls -l hadam3* hadrm3*
-rwxr-xr-x 1 wes wes 2355035 Nov 11 01:33 hadam3p_eu_um_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 2664840 Nov 11 01:33 hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu
-rwxr-xr-x 1 wes wes 75730 Nov 11 01:33 hadam3prm3pm2t_eu_data_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 3771315 Nov 11 01:33 hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 wes wes 2359300 Nov 11 01:33 hadrm3p_eu_um_7.01_i686-pc-linux-gnu.zip


The files are there, and the permissions seem to be OK. Any other idea?


Are any other projects currently running successfully on the two problem systems? If so, are there any differences in permissions or ownership of the project directories for those projects??

I notice that these files are owned by user:group wes:wes rather than by boinc:boinc as would be the case on a default install of BOINC from a Ubuntu repository. Given that the default install also goes in /var/lib/boinc-client, a non-respository install may explain these differences.

Is the location and ownership of the various directories exactly the same on the machines you've got that are running CPDN successfully? If it is, I'm at a loss to explain what you're seeing.

Good luck with solving this - Al.

ID: 52902 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52903 - Posted: 17 Nov 2015, 21:56:49 UTC - in response to Message 52899.  

Some more thoughts.

A re-boot can sometimes fix problems.

*********************

You may have a corrupted BOINC installation.

*********************

You may have a mixed 32/64 bit installation, where you installed one version, and then upgraded to the other, but in the same location.
In Windows at least, the 2 varieties should be installed in 2 different locations.

*********************

You may have a mixed source installation, both Berkeley and repository.
Again, these go in different locations.

*********************

If/when you decide to re-install either/both the OS or BOINC, find the account xml files, (one per project), and save them.
After the new install, put the file(s) back in the same location. Then the project(s) will know who you and your computer(s) are.

For cpdn, the file is: account.climateprediction.net.xml


ID: 52903 · Report as offensive     Reply Quote
Wes

Send message
Joined: 24 Aug 08
Posts: 7
Credit: 37,536,564
RAC: 163
Message 52911 - Posted: 19 Nov 2015, 8:56:37 UTC - in response to Message 52903.  

I had previously (3 times) tried reinstalling BOINC (boinc_7.2.42_x86_64-pc-linux-gnu.sh). It made no difference. There is no other installation of BOINC on these machines. This is the same BOINC installation that is running on other machines where I have no problem running CPDN.
If it were the BOINC installation, I would expect other projects also to fail. However, the other projects I am running work fine: FIND@Home, Poem@Home, World Community Grid, malariacontrol.net .

ID: 52911 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52912 - Posted: 19 Nov 2015, 9:07:42 UTC - in response to Message 52911.  

I'm afraid that I'm out of ideas. Sorry.

ID: 52912 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,512,201
RAC: 928
Message 52916 - Posted: 19 Nov 2015, 16:48:08 UTC

If you haven't tried starting BOINC from the Terminal (I don't read that you have in this thread) a lot of "errors/warnings" that otherwise aren't visible will show up. In my case, the command is:

cd "/home/bob/BOINC" && exec ./boincmgr $@

Just be careful you don't close the Terminal because that will terminate the BOINC manager.

This procedure has always led to solutions to every problem I have had with BOINC or the projects.
ID: 52916 · Report as offensive     Reply Quote
Wes

Send message
Joined: 24 Aug 08
Posts: 7
Credit: 37,536,564
RAC: 163
Message 52920 - Posted: 20 Nov 2015, 18:01:59 UTC - in response to Message 52916.  

Thanks for the suggestion, but it doesn't tell me anything new.
ID: 52920 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,663,251
RAC: 14,512
Message 52937 - Posted: 24 Nov 2015, 9:07:52 UTC

Interestingly, this computer http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1353448

is crashing everything with

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 2 (0x2, -254)
</message>
<stderr_txt>
Process creation (../../projects/climateprediction.net/hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu) failed: Error -1, errno=2
execv: No such file or directory

</stderr_txt>
]]>

And again the users other Linux box seems to be completing the majority of tasks. Don't know if ths is related or noth?
ID: 52937 · Report as offensive     Reply Quote
Digby

Send message
Joined: 17 Feb 06
Posts: 89
Credit: 4,309,159
RAC: 0
Message 52938 - Posted: 24 Nov 2015, 16:12:17 UTC - in response to Message 52937.  

Good catch Dave...

Thyme has contributed a lot over the years and his last post was http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=8096&nowrap=true#52767

He'd definitely like to know about this and perhaps, further down the road, he could share details on how he fixed the problem...

Two questions:
1) Does the Boinc Server redistribute these crashed models?
2) Can the server flag 'duff' workstations so it does not keep mindlessly send new tasks to them?

Cheers
ID: 52938 · Report as offensive     Reply Quote
Profile Maurice Goulois

Send message
Joined: 6 Apr 07
Posts: 2
Credit: 1,454,343
RAC: 13,420
Message 52939 - Posted: 24 Nov 2015, 17:34:22 UTC
Last modified: 24 Nov 2015, 17:41:25 UTC

Hi there :)

Look at these 2 pcs, with kubuntu 15.04 64bit:
1377044 and 1379152

some tasks currently running and sending trickles, some other fail reporting:
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 127 (0x7f, -129)
</message>
<stderr_txt>
../../projects/climateprediction.net/hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

</stderr_txt>
]]>

Maybe problems with initialization code?

ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu hereafter:

linux-gate.so.1 =>  (0xf77ad000)
        libpthread.so.0 => /lib32/libpthread.so.0 (0xf776d000)
        libdl.so.2 => /lib32/libdl.so.2 (0xf7768000)
        libstdc++.so.6 => /usr/lib32/libstdc++.so.6 (0xf7673000)
        libm.so.6 => /lib32/libm.so.6 (0xf7626000)
        libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf7609000)
        libc.so.6 => /lib32/libc.so.6 (0xf7450000)
        /lib/ld-linux.so.2 (0xf77ae000)

ID: 52939 · Report as offensive     Reply Quote
Profile Maurice Goulois

Send message
Joined: 6 Apr 07
Posts: 2
Credit: 1,454,343
RAC: 13,420
Message 52940 - Posted: 24 Nov 2015, 17:56:23 UTC
Last modified: 24 Nov 2015, 17:59:00 UTC

Answering to myself :)

Didnt thoroughly check but these tasks where downloaded before I solve the 32bit libraries missing problem, the ones that trickle were obtained after the solution.

Stupid me, have had a long absence on this project, sorry :)
ID: 52940 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 52941 - Posted: 24 Nov 2015, 19:09:24 UTC - in response to Message 52938.  


Two questions:
1) Does the Boinc Server redistribute these crashed models?
2) Can the server flag 'duff' workstations so it does not keep mindlessly send new tasks to them?

Cheers


I answer to #1, yes it does. On the workunit page, it has a listing of "max # of error/total/success tasks" and that gives info on the number of tasks that will be sent out if one is not completed successfully to that point. I imagine if a work unit with all failed tasks is particularly important or potentially interesting, the scientist could create another work unit with the same parameters to see if one of those tasks could be completed.

For #2, we wish. Perhaps a later version of the boinc server software may have some option like this? I don't know, but it sure would be nice, say, if a PC has successfully downloaded at least 100 tasks, and has continuous computation or download failures of some "reasonable" number, then the computer is flagged and put into a searchable database with the number of allowed downloads minused. One would have to be careful with this, however, since sometimes numerous duff workunits are accidentally generated creating lots of download or computation failures.

Of course the best way for more Linux completions would be to have a 64 bit application. So very, very many of the failures are due to missing 32bit libraries on 64 bit computers that I don't doubt the completions per unit time would more than double if a 64 bit application was deployed. Ahhh...to dream.

ID: 52941 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 52942 - Posted: 25 Nov 2015, 10:14:39 UTC - in response to Message 52941.  

It's not terribly hard to configure Linux with 32 bit support (at least on Ubuntu and related distros), therefore able to run CPDN tasks. But apparently it's difficult enough to cause big problems running models on Linux.

No secret nearly all Linux applications these days are 64 bit, which leaves me curious why CPDN Linux models cling to the older 32 bit design. Sure seems you are right, it's not at all using the power of Linux computers to their potential on behalf of CPDN.
ID: 52942 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,663,251
RAC: 14,512
Message 52943 - Posted: 25 Nov 2015, 10:53:43 UTC - in response to Message 52942.  

My understanding is that it would take weeks if not months to rewrite code for 64 bit and as the code is not open source it would have to be the met office who own it who did it.
ID: 52943 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 19 · Next

Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion ***

©2024 cpdn.org