climateprediction.net (CPDN) home page
Thread 'HadSM4 Error when completed and Uploading'

Thread 'HadSM4 Error when completed and Uploading'

Message boards : Number crunching : HadSM4 Error when completed and Uploading
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 65644 - Posted: 17 Jul 2022, 7:56:13 UTC
Last modified: 17 Jul 2022, 7:59:27 UTC

Just finished a HadSM4 model after 4 and bit days.
As it finished uploading I received a Computation Error, which I thought was strange as it and 3 others had been running fine.

On checking it seems that I was missing a file that I didn't know about

libnsl.so.1

Without this file you get

Unable to load library hadsm4_se_8.02_i686-pc-linux-gnu.so
dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory


I have now updated this file on my Linux install and hope the other 3 work units will now be OK and I won't have 3 more failures (it would be a waste of over 12 days computations equivalent).
They have between 5 and 13 hours to go.

Conan

(PS --Oh just noticed it is my 100th post since I joined in 2006.)
ID: 65644 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 65645 - Posted: 17 Jul 2022, 8:39:36 UTC

On checking it seems that I was missing a file that I didn't know about

libnsl.so.1
A quick search suggests that it isn't installed by default on Red Hat so may well not be on Fedora either. It seems on Ubuntu and variants it is there on a default installation. Thanks for bringing this to attention.
ID: 65645 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65647 - Posted: 17 Jul 2022, 11:07:31 UTC - in response to Message 65645.  

On checking it seems that I was missing a file that I didn't know about

libnsl.so.1

A quick search suggests that it isn't installed by default on Red Hat so may well not be on Fedora either. It seems on Ubuntu and variants it is there on a default installation. Thanks for bringing this to attention.


I am running Red Hat Enterprise Linux release 8.6 (Ootpa)

I do not know if it is default or not, but the required libraries are:
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadsm4_um_8.02_i686-pc-linux-gnu
	linux-gate.so.1 (0xf7f17000)
	libdl.so.2 => /lib/libdl.so.2 (0xf7efd000)
	libm.so.6 => /lib/libm.so.6 (0xf7e2b000)
	libpthread.so.0 => /lib/libpthread.so.0 (0xf7e0a000)
	libc.so.6 => /lib/libc.so.6 (0xf7c62000)
	/lib/ld-linux.so.2 (0xf7f19000)
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadsm4_se_8.02_i686-pc-linux-gnu.so
	linux-gate.so.1 (0xf7edc000)
	libnsl.so.1 => /lib/libnsl.so.1 (0xf7e25000)
	libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7c92000)
	libm.so.6 => /lib/libm.so.6 (0xf7bc0000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7ba3000)
	libc.so.6 => /lib/libc.so.6 (0xf79fb000)
	/lib/ld-linux.so.2 (0xf7ede000)


This may be a clue as to where I got it:
localhost:jeandavid8[/lib]$ ls -l libnsl.so.1 
lrwxrwxrwx. 1 root root 14 Jun  8 15:17 libnsl.so.1 -> libnsl-2.28.so
localhost:jeandavid8[/lib]$ rpm -qf libnsl-2.28.so 
libnsl-2.28-189.5.el8_6.i686

https://centos.pkgs.org › 8 › centos-baseos-x86_64 › libnsl-2.28-164.el8.i686.rpm.html
libnsl-2.28-164.el8.i686.rpm CentOS 8 Download - pkgs.org
Installed size. 157.62 KB. This package provides the legacy version of libnsl library, for accessing NIS services. This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support.

ID: 65647 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 65648 - Posted: 17 Jul 2022, 11:52:15 UTC
Last modified: 17 Jul 2022, 11:53:33 UTC

This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support
.


I wonder if that means this particular batch was set up on a machine using the older library and that is why I haven't come across that particular error before, either personally or in others' posts?

If the consensus is that is likely, I will alert the project.
ID: 65648 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 65649 - Posted: 17 Jul 2022, 13:31:43 UTC - in response to Message 65648.  

This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support
.


I wonder if that means this particular batch was set up on a machine using the older library and that is why I haven't come across that particular error before, either personally or in others' posts?

If the consensus is that is likely, I will alert the project.

@Dave

That was the error I got a couple years back when I installed Fedora 32? to try to help troubleshoot a problem a user was having with that distribution. That was when I sent you the instructions for updating the post on 32bit libraries to include Fedora and this libnsl in the 32bit library instruction post.


@Conan

I believe that error crops up in upload transfers, so if that error resulted in one or more uploads from a task not making it to the servers, the task will likely be marked as an error in the database.
ID: 65649 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 65650 - Posted: 17 Jul 2022, 16:34:04 UTC - in response to Message 65649.  

That was the error I got a couple years back when I installed Fedora 32? to try to help troubleshoot a problem a user was having with that distribution. That was when I sent you the instructions for updating the post on 32bit libraries to include Fedora and this libnsl in the 32bit library instruction post.


Thanks George, I should have checked before answering. I pasted your instructions for Fedora without really reading them when I updated that thread.
ID: 65650 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65651 - Posted: 17 Jul 2022, 21:25:46 UTC - in response to Message 65648.  

This library is provided for backwards compatibility only; applications should use libnsl2 instead to gain IPv6 support

I wonder if that means this particular batch was set up on a machine using the older library and that is why I haven't come across that particular error before, either personally or in others' posts?

If the consensus is that is likely, I will alert the project.

For completeness, here are the models I have on my Red Hat Enterprise Linux release 8.6 (Ootpa) machine.
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_8.09_i686-pc-linux-gnu
	linux-gate.so.1 (0xf7eee000)
	libpthread.so.0 => /lib/libpthread.so.0 (0xf7eb8000)
	libdl.so.2 => /lib/libdl.so.2 (0xf7eb3000)
	libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7d20000)
	libm.so.6 => /lib/libm.so.6 (0xf7c4e000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7c31000)
	libc.so.6 => /lib/libc.so.6 (0xf7a89000)
	/lib/ld-linux.so.2 (0xf7ef0000)
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_se_8.09_i686-pc-linux-gnu.so
	linux-gate.so.1 (0xf7eee000)
	libnsl.so.1 => /lib/libnsl.so.1 (0xf7d9f000)   <---<<<
	libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7c0c000)
	libm.so.6 => /lib/libm.so.6 (0xf7b3a000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7b1d000)
	libc.so.6 => /lib/libc.so.6 (0xf7975000)
	/lib/ld-linux.so.2 (0xf7ef0000)

localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadcm3s_um_8.36_i686-pc-linux-gnu
	linux-gate.so.1 (0xf7f6f000)
	libdl.so.2 => /lib/libdl.so.2 (0xf7f55000)
	libm.so.6 => /lib/libm.so.6 (0xf7e83000)
	libpthread.so.0 => /lib/libpthread.so.0 (0xf7e62000)
	libc.so.6 => /lib/libc.so.6 (0xf7cba000)
	/lib/ld-linux.so.2 (0xf7f71000)
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadcm3s_se_8.36_i686-pc-linux-gnu.so
	linux-gate.so.1 (0xf7fb0000)
	libz.so.1 => /lib/libz.so.1 (0xf7ef7000)
	libnsl.so.1 => /lib/libnsl.so.1 (0xf7edb000)   <---<<<
	libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7d48000)
	libm.so.6 => /lib/libm.so.6 (0xf7c76000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7c59000)
	libc.so.6 => /lib/libc.so.6 (0xf7ab1000)
	/lib/ld-linux.so.2 (0xf7fb2000)

localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_um_8.52_i686-pc-linux-gnu
	linux-gate.so.1 (0xf7f2e000)
	libdl.so.2 => /lib/libdl.so.2 (0xf7f14000)
	libm.so.6 => /lib/libm.so.6 (0xf7e42000)
	libpthread.so.0 => /lib/libpthread.so.0 (0xf7e21000)
	libc.so.6 => /lib/libc.so.6 (0xf7c79000)
	/lib/ld-linux.so.2 (0xf7f30000)
localhost:jeandavid8[/var/lib/boinc/projects/climateprediction.net]$ ldd hadam4_se_8.52_i686-pc-linux-gnu.so
	linux-gate.so.1 (0xf7f9f000)
	libnsl.so.1 => /lib/libnsl.so.1 (0xf7e50000)   <---<<<
	libstdc++.so.6 => /lib/libstdc++.so.6 (0xf7cbd000)
	libm.so.6 => /lib/libm.so.6 (0xf7beb000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf7bce000)
	libc.so.6 => /lib/libc.so.6 (0xf7a26000)
	/lib/ld-linux.so.2 (0xf7fa1000)

ID: 65651 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 65652 - Posted: 18 Jul 2022, 0:15:33 UTC

Yes even though I updated my files all 3 (of 4) SM4 work units have now finished in an error.

So even though all looks good and trickles are reported and I got some credit, something is missing and the it errors out.

Well 1 to go in less than 5 hours and I will be done with a waste of 12 days (each WU ran 4 days) of crunching with no valid results.

I may take another break from the project again unless a few types I have not run yet get work and I will try them.

Thanks all for your help

Conan
ID: 65652 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65653 - Posted: 18 Jul 2022, 1:31:30 UTC - in response to Message 65652.  

I wonder what your problem is. Here are the recent results one of a group of four I received recently. Currently, four more are running and I expect them to complete successfully too.

Task 22217467
Name 	hadsm4_a0ye_201310_6_933_012144700_0
Workunit 	12144700
Created 	11 Jul 2022, 10:57:52 UTC
Sent 	14 Jul 2022, 6:24:03 UTC
Report deadline 	26 Jun 2023, 11:44:03 UTC
Received 	16 Jul 2022, 22:31:14 UTC
Server state 	Over
Outcome 	Success
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	1511241
Run time 	2 days 5 hours 30 min 38 sec
CPU time 	2 days 4 hours 52 min 58 sec
Validate state 	Valid
Credit 	9,616.92
Device peak FLOPS 	6.58 GFLOPS
Application version 	UK Met Office HadSM4 at N144 resolution v8.02
i686-pc-linux-gnu


And this one, not yet complete, has provided three trickles:

Task 22221604
Name 	hadsm4_a0q5_201310_6_933_012144403_1
Workunit 	12144403
Created 	16 Jul 2022, 9:06:07 UTC
Sent 	16 Jul 2022, 9:18:45 UTC
Report deadline 	28 Jun 2023, 14:38:45 UTC
Received 	---
Server state 	In progress
Outcome 	---
Client state 	New
Exit status 	0 (0x00000000)
Computer ID 	1511241

ID: 65653 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 65654 - Posted: 18 Jul 2022, 2:57:25 UTC - in response to Message 65653.  

@Jean-David

The install of his Fedora 36 did not include the libnsl file that is apparently needed. This results in upload failures (for some reason). He has installed this file now and the 6th and final file did upload correctly. However, since the other five monthly zip files did not go up (before he installed libnsl), boinc marked the results as errors.


@Conan

Looking at the stderr on your task webpages, the 6th zip must have been uploaded successfully, but the other 5 monthly zips weren't. So boinc marked the result as an error. Now that libnsl is installed, you shouldn't have any more errors of this type.
ID: 65654 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65655 - Posted: 18 Jul 2022, 6:09:57 UTC
Last modified: 18 Jul 2022, 6:10:59 UTC

I do not have an inkling of anyone's faults but one of my WU's is trying to upload for the last 48 hours while one has uploaded. If one has and one hasn't? Linux Mint.
ID: 65655 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 65656 - Posted: 18 Jul 2022, 6:39:35 UTC - in response to Message 65655.  

Please post a link, and I'll ask the project to check.
It's only a bit after 7.30am there at the moment,
ID: 65656 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 65657 - Posted: 18 Jul 2022, 11:48:49 UTC

There is some work being done on the servers at the moment which is delaying the next batch of HADSM4 till later in the week but doesn't make much sense when others have gone through OK. If that one is from a batch no that is different from all the working ones, it may be wherever in the world the server is it is down or otherwise not functioning properly. The link will enable that to be checked and Andy could then ping a message to the relevant university to kick said server.
ID: 65657 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 65658 - Posted: 18 Jul 2022, 12:59:27 UTC - in response to Message 65654.  

@Jean-David

The install of his Fedora 36 did not include the libnsl file that is apparently needed. This results in upload failures (for some reason). He has installed this file now and the 6th and final file did upload correctly. However, since the other five monthly zip files did not go up (before he installed libnsl), boinc marked the results as errors.


@Conan

Looking at the stderr on your task webpages, the 6th zip must have been uploaded successfully, but the other 5 monthly zips weren't. So boinc marked the result as an error. Now that libnsl is installed, you shouldn't have any more errors of this type.


Thanks geophi,

I decided to check back on a few WUs I ran back in May 2021 and found they had failed for the same reason.
I had Fedora 31 at the time, but apparently I did not check as to why the work units were marked invalid.
If I had of checked I could of fixed this issue last year and had 4 successful results now instead of 4 failures.

I will have to check why things fail a bit better it seems. You live and learn.

Conan
ID: 65658 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65661 - Posted: 19 Jul 2022, 6:44:41 UTC - in response to Message 65657.  

The server was taking a short break. It has been uploaded now. thank you.
ID: 65661 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 65725 - Posted: 1 Aug 2022, 6:58:09 UTC

Just had a work unit complete without error, so that missing libnsl file was the problem.

Conan
ID: 65725 · Report as offensive     Reply Quote

Message boards : Number crunching : HadSM4 Error when completed and Uploading

©2024 cpdn.org