climateprediction.net (CPDN) home page
Thread 'Another problem hadam3pm2 MOSES global -- all fail'

Thread 'Another problem hadam3pm2 MOSES global -- all fail'

Questions and Answers : Unix/Linux : Another problem hadam3pm2 MOSES global -- all fail
Message board moderation

To post messages, you must log in.

AuthorMessage
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 51632 - Posted: 14 Mar 2015, 10:52:49 UTC

There's another problem with "hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only)"

All the more recent models fail, because they never upload the last "***.10.zip" because they quit after 9 uploads. So the researchers get 9 out of 10, but the models report fail. see my status page for many examples. These are the models with "blah blah _p** 1991 in their names.
ID: 51632 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 51636 - Posted: 14 Mar 2015, 15:28:43 UTC - in response to Message 51632.  

There's another problem with "hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only)"

All the more recent models fail, because they never upload the last "***.10.zip" because they quit after 9 uploads. So the researchers get 9 out of 10, but the models report fail. see my status page for many examples. These are the models with "blah blah _p** 1991 in their names.


Yep. Reported "p" series problem to the programmers a few days ago. Haven't heard anything back yet.

ID: 51636 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 51637 - Posted: 14 Mar 2015, 16:35:07 UTC - in response to Message 51636.  

Thanks
ID: 51637 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 52122 - Posted: 28 Jun 2015, 18:10:15 UTC

Task Id hadam3pm2 (hadam3pm2)


Just started playing around with Linux on a spare laptop so thought I'd run some CPDN stuff.
After seeing some strange %run stats, I found this thread and wondered if it's it's worth continuing the run if they all fail without uploading the last zip.
If they're then reallocated it would be a lot of wasted cpu time.
ID: 52122 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 52123 - Posted: 28 Jun 2015, 18:50:50 UTC

IMO, the "global only" models are not worth running unless there are no interruptions where the model is removed from memory.

ID: 52123 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,025,554
RAC: 20,468
Message 52124 - Posted: 28 Jun 2015, 19:13:50 UTC - in response to Message 52123.  

MO, the "global only" models are not worth running unless there are no interruptions where the model is removed from memory.


I don't run them for that reason, they are long enough that with the speed of my machines it is unlikely that I am going to get through the run time without having to reboot for any reason.
ID: 52124 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 52125 - Posted: 28 Jun 2015, 21:04:10 UTC

Thanks. I'll just let this one run & see what happens. I'm unlikely to get a power failure where I live & it's something of an experimental box for Linux.
ID: 52125 · Report as offensive     Reply Quote
Desti

Send message
Joined: 6 Aug 04
Posts: 124
Credit: 9,195,838
RAC: 0
Message 52196 - Posted: 8 Jul 2015, 10:29:39 UTC

ID: 52196 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 52202 - Posted: 9 Jul 2015, 0:41:15 UTC - in response to Message 52196.  

I've 100% failure rate with the HadAM3P models.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1363686&offset=0&show_names=0&state=5


Looking at stderr, it looks like some problem with libz/zlib.

Do a

sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu
sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so

Perhaps the 32bit version did not get installed??
ID: 52202 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,025,554
RAC: 20,468
Message 52207 - Posted: 9 Jul 2015, 7:57:29 UTC
Last modified: 9 Jul 2015, 8:07:54 UTC

I had this recently on a beta task but only on one of two tasks in a set. The other one crashed for a different reason. This is different from the normal missing library problems in that it is happening quite a long way into the task 198,215.10seconds in the one I looked at on Desti's computer. It was similar on the beta task I had with it. I haven't had any new beta tasks since so haven't gotten around to doing an ldd there yet to check but it seems strange that it shouldn't fail in the first few seconds if it is the normal missing 32bit lib problem?

just done an ldd on all the executables in projects/cpdnbeta and nothing showed up as missing. Will track down the relevant bits and post on the beta mailing list.
ID: 52207 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 52210 - Posted: 9 Jul 2015, 21:41:36 UTC - in response to Message 52207.  
Last modified: 9 Jul 2015, 21:42:14 UTC

Dave,

Check this out

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1354308&offset=80&show_names=0&state=0

One of yours that went all the way to the end, yet failed. Has the same type zlib errors. Looks like all the MOSES EU models do on that PC.
ID: 52210 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,025,554
RAC: 20,468
Message 52211 - Posted: 10 Jul 2015, 7:42:03 UTC - in response to Message 52210.  
Last modified: 10 Jul 2015, 8:03:17 UTC

Thanks for that, I had put it down to the fact that the computer in question was shutting down from time to time. More recently it has been running continuously but has not had any of these model types since bar one which is less than half way through. I have just gone through all the executables with ldd and it has not shown any missing dependencies.

When I Google for, "missing libz.so1" it is suggested I run sudo apt-get install lib32z1 which tells me that the newest version is already installed.

I wonder if the executable hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so is corrupted on that machine? - Just noticed ldd on that one gives a no such file or directory message on that machine but not on this one. I will suspend computation on it and copy the file from here and see what happens.

EDIT: I will delete the equivalent file from projects/cpdnbeta/ to force it to download again and see if that sorts out the problem there?
ID: 52211 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 52212 - Posted: 10 Jul 2015, 8:30:06 UTC - in response to Message 52196.  

I've 100% failure rate with the HadAM3P models.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1363686&offset=0&show_names=0&state=5


Desti -- seems that most of your WU fails have got to the very end - got max credits - it's only at the final upload that they fail.
Obviously there's a missing 32-bit zlib - or - libz.so.1 aka (32-bit) libz.so.1.2.8

Try "sudo find /lib -name 'libz*'" there should be both 64-bit and 32-bit versions

I only guess that the fail happens when libz is called thru BOINC/projects/climateprediction.net/hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so at the final upload -- maybe zlib isn't needed until the very end upload? Possible?

Maybe zlib is called (with failing parameters) only at final upload time, that would explain why your CPDN WU's fail at end.

Gentoo, i have no clue, but I think if you google gentoo multilib and install the 32-bit libs mentioned there - probably will help.

thanks for your crunching, and no, it's not all wasted with this final upload fail

HTh



ID: 52212 · Report as offensive     Reply Quote
Desti

Send message
Joined: 6 Aug 04
Posts: 124
Credit: 9,195,838
RAC: 0
Message 52213 - Posted: 10 Jul 2015, 8:50:12 UTC

Thanks, I added 32 bit version of zlib to my build.
Linux Users Everywhere @ BOINC
ID: 52213 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 52219 - Posted: 11 Jul 2015, 3:20:22 UTC

Dave,

This is what it looks like on my Linux Mint 15 PC (equivalent Ubuntu 13.04)


sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so
linux-gate.so.1 => (0xf77d5000)
libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf7717000)
libnsl.so.1 => /lib/i386-linux-gnu/libnsl.so.1 (0xf76fd000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7617000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf75eb000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf75cd000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7424000)
/lib/ld-linux.so.2 (0xf77d6000)

sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu
linux-gate.so.1 => (0xf7791000)
libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7757000)
libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7752000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf766c000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7640000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7622000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7479000)
/lib/ld-linux.so.2 (0xf7792000)
ID: 52219 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,025,554
RAC: 20,468
Message 52220 - Posted: 11 Jul 2015, 7:08:52 UTC - in response to Message 52219.  

sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so
linux-gate.so.1 => (0xf77ca000)
libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf76fa000)
libnsl.so.1 => /lib/i386-linux-gnu/libnsl.so.1 (0xf76df000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf75e9000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf759c000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf757f000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf73c4000)
/lib/ld-linux.so.2 (0xf77cb000)


sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu
linux-gate.so.1 => (0xf77a7000)
libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf775e000)
libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7759000)
libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7663000)
libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7616000)
libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf75f9000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf743e000)
/lib/ld-linux.so.2 (0xf77a8000)

Apart from the strings at the end that looks identical to what I have.
ID: 52220 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,025,554
RAC: 20,468
Message 52320 - Posted: 24 Jul 2015, 12:51:08 UTC

Latest eu model completed. I have re-installed ubuntu on the machine however I have gone for xubuntu rather than kubuntu this time so it matches my desktop installation exactly apart from rather fewer programs installed on it.

I have not been able to work out why the previous KDE installation was giving the missing library message and the XFCE one doesn't. If anyone has any clues I would be interested to know.

I expect that the beta tasks I had which were giving the same error are also going to complete now.

If I had a third machine to hand I might try setting it up with kde, then clone the boinc directories and run some tasks with it not connected to the interweb to try and work out what the problem was. As it is, for the time being I am not going to change the laptop back and risk more tasks finishing and then failing right at the end.
ID: 52320 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Another problem hadam3pm2 MOSES global -- all fail

©2024 cpdn.org