Message boards : Number crunching : HadSM4 Error when completed and Uploading
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Just finished a HadSM4 model after 4 and bit days. As it finished uploading I received a Computation Error, which I thought was strange as it and 3 others had been running fine. On checking it seems that I was missing a file that I didn't know about libnsl.so.1 Without this file you get Unable to load library hadsm4_se_8.02_i686-pc-linux-gnu.so dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory I have now updated this file on my Linux install and hope the other 3 work units will now be OK and I won't have 3 more failures (it would be a waste of over 12 days computations equivalent). They have between 5 and 13 hours to go. Conan (PS --Oh just noticed it is my 100th post since I joined in 2006.) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
On checking it seems that I was missing a file that I didn't know aboutA quick search suggests that it isn't installed by default on Red Hat so may well not be on Fedora either. It seems on Ubuntu and variants it is there on a default installation. Thanks for bringing this to attention. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
On checking it seems that I was missing a file that I didn't know about I am running Red Hat Enterprise Linux release 8.6 (Ootpa) I do not know if it is default or not, but the required libraries are: localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadsm4_um_8.02_i686-pc-linux-gnu linux-gate.so.1 (0xf7f17000) libdl.so.2 => /lib/libdl.so.2 (0xf7efd000) libm.so.6 => /lib/libm.so.6 (0xf7e2b000) libpthread.so.0 => /lib/libpthread.so.0 (0xf7e0a000) libc.so.6 => /lib/libc.so.6 (0xf7c62000) /lib/ld-linux.so.2 (0xf7f19000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadsm4_se_8.02_i686-pc-linux-gnu.so linux-gate.so.1 (0xf7edc000) libnsl.so.1 => /lib/libnsl.so.1 (0xf7e25000) libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7c92000) libm.so.6 => /lib/libm.so.6 (0xf7bc0000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ba3000) libc.so.6 => /lib/libc.so.6 (0xf79fb000) /lib/ld-linux.so.2 (0xf7ede000) This may be a clue as to where I got it: localhost:jeandavid8[/lib]$ ls -l libnsl.so.1 lrwxrwxrwx. 1 root root 14 Jun 8 15:17 libnsl.so.1 -> libnsl-2.28.so localhost:jeandavid8[/lib]$ rpm -qf libnsl-2.28.so libnsl-2.28-189.5.el8_6.i686 https://centos.pkgs.org › 8 › centos-baseos-x86_64 › libnsl-2.28-164.el8.i686.rpm.html libnsl-2.28-164.el8.i686.rpm CentOS 8 Download - pkgs.org Installed size. 157.62 KB. This package provides the legacy version of libnsl library, for accessing NIS services. This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support. I wonder if that means this particular batch was set up on a machine using the older library and that is why I haven't come across that particular error before, either personally or in others' posts? If the consensus is that is likely, I will alert the project. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support. @Dave That was the error I got a couple years back when I installed Fedora 32? to try to help troubleshoot a problem a user was having with that distribution. That was when I sent you the instructions for updating the post on 32bit libraries to include Fedora and this libnsl in the 32bit library instruction post. @Conan I believe that error crops up in upload transfers, so if that error resulted in one or more uploads from a task not making it to the servers, the task will likely be marked as an error in the database. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
That was the error I got a couple years back when I installed Fedora 32? to try to help troubleshoot a problem a user was having with that distribution. That was when I sent you the instructions for updating the post on 32bit libraries to include Fedora and this libnsl in the 32bit library instruction post. Thanks George, I should have checked before answering. I pasted your instructions for Fedora without really reading them when I updated that thread. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support For completeness, here are the models I have on my Red Hat Enterprise Linux release 8.6 (Ootpa) machine. localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_8.09_i686-pc-linux-gnu linux-gate.so.1 (0xf7eee000) libpthread.so.0 => /lib/libpthread.so.0 (0xf7eb8000) libdl.so.2 => /lib/libdl.so.2 (0xf7eb3000) libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7d20000) libm.so.6 => /lib/libm.so.6 (0xf7c4e000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7c31000) libc.so.6 => /lib/libc.so.6 (0xf7a89000) /lib/ld-linux.so.2 (0xf7ef0000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_se_8.09_i686-pc-linux-gnu.so linux-gate.so.1 (0xf7eee000) libnsl.so.1 => /lib/libnsl.so.1 (0xf7d9f000) <---<<< libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7c0c000) libm.so.6 => /lib/libm.so.6 (0xf7b3a000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7b1d000) libc.so.6 => /lib/libc.so.6 (0xf7975000) /lib/ld-linux.so.2 (0xf7ef0000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadcm3s_um_8.36_i686-pc-linux-gnu linux-gate.so.1 (0xf7f6f000) libdl.so.2 => /lib/libdl.so.2 (0xf7f55000) libm.so.6 => /lib/libm.so.6 (0xf7e83000) libpthread.so.0 => /lib/libpthread.so.0 (0xf7e62000) libc.so.6 => /lib/libc.so.6 (0xf7cba000) /lib/ld-linux.so.2 (0xf7f71000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadcm3s_se_8.36_i686-pc-linux-gnu.so linux-gate.so.1 (0xf7fb0000) libz.so.1 => /lib/libz.so.1 (0xf7ef7000) libnsl.so.1 => /lib/libnsl.so.1 (0xf7edb000) <---<<< libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7d48000) libm.so.6 => /lib/libm.so.6 (0xf7c76000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7c59000) libc.so.6 => /lib/libc.so.6 (0xf7ab1000) /lib/ld-linux.so.2 (0xf7fb2000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_um_8.52_i686-pc-linux-gnu linux-gate.so.1 (0xf7f2e000) libdl.so.2 => /lib/libdl.so.2 (0xf7f14000) libm.so.6 => /lib/libm.so.6 (0xf7e42000) libpthread.so.0 => /lib/libpthread.so.0 (0xf7e21000) libc.so.6 => /lib/libc.so.6 (0xf7c79000) /lib/ld-linux.so.2 (0xf7f30000) localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_se_8.52_i686-pc-linux-gnu.so linux-gate.so.1 (0xf7f9f000) libnsl.so.1 => /lib/libnsl.so.1 (0xf7e50000) <---<<< libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7cbd000) libm.so.6 => /lib/libm.so.6 (0xf7beb000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7bce000) libc.so.6 => /lib/libc.so.6 (0xf7a26000) /lib/ld-linux.so.2 (0xf7fa1000) |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Yes even though I updated my files all 3 (of 4) SM4 work units have now finished in an error. So even though all looks good and trickles are reported and I got some credit, something is missing and the it errors out. Well 1 to go in less than 5 hours and I will be done with a waste of 12 days (each WU ran 4 days) of crunching with no valid results. I may take another break from the project again unless a few types I have not run yet get work and I will try them. Thanks all for your help Conan |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I wonder what your problem is. Here are the recent results one of a group of four I received recently. Currently, four more are running and I expect them to complete successfully too. Task 22217467 Name hadsm4_a0ye_201310_6_933_012144700_0 Workunit 12144700 Created 11 Jul 2022, 10:57:52 UTC Sent 14 Jul 2022, 6:24:03 UTC Report deadline 26 Jun 2023, 11:44:03 UTC Received 16 Jul 2022, 22:31:14 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 2 days 5 hours 30 min 38 sec CPU time 2 days 4 hours 52 min 58 sec Validate state Valid Credit 9,616.92 Device peak FLOPS 6.58 GFLOPS Application version UK Met Office HadSM4 at N144 resolution v8.02 i686-pc-linux-gnu And this one, not yet complete, has provided three trickles: Task 22221604 Name hadsm4_a0q5_201310_6_933_012144403_1 Workunit 12144403 Created 16 Jul 2022, 9:06:07 UTC Sent 16 Jul 2022, 9:18:45 UTC Report deadline 28 Jun 2023, 14:38:45 UTC Received --- Server state In progress Outcome --- Client state New Exit status 0 (0x00000000) Computer ID 1511241 |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
@Jean-David The install of his Fedora 36 did not include the libnsl file that is apparently needed. This results in upload failures (for some reason). He has installed this file now and the 6th and final file did upload correctly. However, since the other five monthly zip files did not go up (before he installed libnsl), boinc marked the results as errors. @Conan Looking at the stderr on your task webpages, the 6th zip must have been uploaded successfully, but the other 5 monthly zips weren't. So boinc marked the result as an error. Now that libnsl is installed, you shouldn't have any more errors of this type. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
I do not have an inkling of anyone's faults but one of my WU's is trying to upload for the last 48 hours while one has uploaded. If one has and one hasn't? Linux Mint. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Please post a link, and I'll ask the project to check. It's only a bit after 7.30am there at the moment, |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
There is some work being done on the servers at the moment which is delaying the next batch of HADSM4 till later in the week but doesn't make much sense when others have gone through OK. If that one is from a batch no that is different from all the working ones, it may be wherever in the world the server is it is down or otherwise not functioning properly. The link will enable that to be checked and Andy could then ping a message to the relevant university to kick said server. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
@Jean-David Thanks geophi, I decided to check back on a few WUs I ran back in May 2021 and found they had failed for the same reason. I had Fedora 31 at the time, but apparently I did not check as to why the work units were marked invalid. If I had of checked I could of fixed this issue last year and had 4 successful results now instead of 4 failures. I will have to check why things fail a bit better it seems. You live and learn. Conan |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
The server was taking a short break. It has been uploaded now. thank you. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Just had a work unit complete without error, so that missing libnsl file was the problem. Conan |
©2024 cpdn.org