Questions and Answers : Unix/Linux : Another problem hadam3pm2 MOSES global -- all fail
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
There's another problem with "hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only)" All the more recent models fail, because they never upload the last "***.10.zip" because they quit after 9 uploads. So the researchers get 9 out of 10, but the models report fail. see my status page for many examples. These are the models with "blah blah _p** 1991 in their names. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
There's another problem with "hadam3pm2 (hadam3p model with MOSES II land scheme) (currently no graphics) (Linux only)" Yep. Reported "p" series problem to the programmers a few days ago. Haven't heard anything back yet. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Thanks |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Task Id hadam3pm2 (hadam3pm2) Just started playing around with Linux on a spare laptop so thought I'd run some CPDN stuff. After seeing some strange %run stats, I found this thread and wondered if it's it's worth continuing the run if they all fail without uploading the last zip. If they're then reallocated it would be a lot of wasted cpu time. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
IMO, the "global only" models are not worth running unless there are no interruptions where the model is removed from memory. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
MO, the "global only" models are not worth running unless there are no interruptions where the model is removed from memory. I don't run them for that reason, they are long enough that with the speed of my machines it is unlikely that I am going to get through the run time without having to reboot for any reason. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Thanks. I'll just let this one run & see what happens. I'm unlikely to get a power failure where I live & it's something of an experimental box for Linux. |
Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0 |
I've 100% failure rate with the HadAM3P models. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1363686&offset=0&show_names=0&state=5 Linux Users Everywhere @ BOINC |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I've 100% failure rate with the HadAM3P models. Looking at stderr, it looks like some problem with libz/zlib. Do a sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so Perhaps the 32bit version did not get installed?? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
I had this recently on a beta task but only on one of two tasks in a set. The other one crashed for a different reason. This is different from the normal missing library problems in that it is happening quite a long way into the task 198,215.10seconds in the one I looked at on Desti's computer. It was similar on the beta task I had with it. I haven't had any new beta tasks since so haven't gotten around to doing an ldd there yet to check but it seems strange that it shouldn't fail in the first few seconds if it is the normal missing 32bit lib problem? just done an ldd on all the executables in projects/cpdnbeta and nothing showed up as missing. Will track down the relevant bits and post on the beta mailing list. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Dave, Check this out http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1354308&offset=80&show_names=0&state=0 One of yours that went all the way to the end, yet failed. Has the same type zlib errors. Looks like all the MOSES EU models do on that PC. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
Thanks for that, I had put it down to the fact that the computer in question was shutting down from time to time. More recently it has been running continuously but has not had any of these model types since bar one which is less than half way through. I have just gone through all the executables with ldd and it has not shown any missing dependencies. When I Google for, "missing libz.so1" it is suggested I run sudo apt-get install lib32z1 which tells me that the newest version is already installed. I wonder if the executable hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so is corrupted on that machine? - Just noticed ldd on that one gives a no such file or directory message on that machine but not on this one. I will suspend computation on it and copy the file from here and see what happens. EDIT: I will delete the equivalent file from projects/cpdnbeta/ to force it to download again and see if that sorts out the problem there? |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
I've 100% failure rate with the HadAM3P models. Desti -- seems that most of your WU fails have got to the very end - got max credits - it's only at the final upload that they fail. Obviously there's a missing 32-bit zlib - or - libz.so.1 aka (32-bit) libz.so.1.2.8 Try "sudo find /lib -name 'libz*'" there should be both 64-bit and 32-bit versions I only guess that the fail happens when libz is called thru BOINC/projects/climateprediction.net/hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so at the final upload -- maybe zlib isn't needed until the very end upload? Possible? Maybe zlib is called (with failing parameters) only at final upload time, that would explain why your CPDN WU's fail at end. Gentoo, i have no clue, but I think if you google gentoo multilib and install the 32-bit libs mentioned there - probably will help. thanks for your crunching, and no, it's not all wasted with this final upload fail HTh |
Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0 |
Thanks, I added 32 bit version of zlib to my build. Linux Users Everywhere @ BOINC |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Dave, This is what it looks like on my Linux Mint 15 PC (equivalent Ubuntu 13.04) sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so linux-gate.so.1 => (0xf77d5000) libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf7717000) libnsl.so.1 => /lib/i386-linux-gnu/libnsl.so.1 (0xf76fd000) libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7617000) libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf75eb000) libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf75cd000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7424000) /lib/ld-linux.so.2 (0xf77d6000) sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu linux-gate.so.1 => (0xf7791000) libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7757000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7752000) libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf766c000) libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7640000) libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf7622000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7479000) /lib/ld-linux.so.2 (0xf7792000) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
sudo ldd hadam3prm3pm2t_eu_se_7.01_i686-pc-linux-gnu.so linux-gate.so.1 => (0xf77ca000) libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf76fa000) libnsl.so.1 => /lib/i386-linux-gnu/libnsl.so.1 (0xf76df000) libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf75e9000) libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf759c000) libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf757f000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf73c4000) /lib/ld-linux.so.2 (0xf77cb000) sudo ldd hadam3prm3pm2t_eu_7.01_i686-pc-linux-gnu linux-gate.so.1 => (0xf77a7000) libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf775e000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7759000) libstdc++.so.6 => /usr/lib/i386-linux-gnu/libstdc++.so.6 (0xf7663000) libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7616000) libgcc_s.so.1 => /lib/i386-linux-gnu/libgcc_s.so.1 (0xf75f9000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf743e000) /lib/ld-linux.so.2 (0xf77a8000) Apart from the strings at the end that looks identical to what I have. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
Latest eu model completed. I have re-installed ubuntu on the machine however I have gone for xubuntu rather than kubuntu this time so it matches my desktop installation exactly apart from rather fewer programs installed on it. I have not been able to work out why the previous KDE installation was giving the missing library message and the XFCE one doesn't. If anyone has any clues I would be interested to know. I expect that the beta tasks I had which were giving the same error are also going to complete now. If I had a third machine to hand I might try setting it up with kde, then clone the boinc directories and run some tasks with it not connected to the interweb to try and work out what the problem was. As it is, for the time being I am not going to change the laptop back and risk more tasks finishing and then failing right at the end. |
©2024 cpdn.org