climateprediction.net (CPDN) home page
Thread 'Misconfigured Machine?'

Thread 'Misconfigured Machine?'

Message boards : climateprediction.net Science : Misconfigured Machine?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
_Ryle_

Send message
Joined: 17 Aug 05
Posts: 22
Credit: 16,057,688
RAC: 15,434
Message 64408 - Posted: 25 Aug 2021, 13:33:23 UTC - in response to Message 64391.  

Sadly it didn't help.
This host is over 6000 failures now: https://www.cpdn.org/show_host_detail.php?hostid=1517479

Even astronomers can be neglective it seems :)

It looks like 2 cloud computers.
ID: 64408 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64409 - Posted: 25 Aug 2021, 14:13:44 UTC - in response to Message 64408.  

Email sent re 2 of his computers.
ID: 64409 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 64410 - Posted: 26 Aug 2021, 22:36:43 UTC - in response to Message 64408.  
Last modified: 26 Aug 2021, 22:38:17 UTC

Looks like he is using wierd locations for the task files that BOINC manager cannot find them. Possibly some sort of cloud storage.
ID: 64410 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64411 - Posted: 27 Aug 2021, 3:29:53 UTC - in response to Message 64410.  

Looks like he is using wierd locations for the task files that BOINC manager cannot find them. Possibly some sort of cloud storage.

Could he have gotten his boinc-client and boincmgr from flatpak?
ID: 64411 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64412 - Posted: 27 Aug 2021, 5:50:07 UTC

I haven't heard from Andy yet, so I've sent Eric an email about them.
ID: 64412 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 64413 - Posted: 27 Aug 2021, 22:38:24 UTC - in response to Message 64410.  

Extract from the STD ERR output from one of the tasks on Eric's computer:

unzip: cannot find or open /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_se_8.02_i686-pc-linux-gnu.zip, /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_se_8.02_i686-pc-linux-gnu.zip.zip or /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_se_8.02_i686-pc-linux-gnu.zip.ZIP.
unzip: cannot find or open /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_um_8.02_i686-pc-linux-gnu.zip, /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_um_8.02_i686-pc-linux-gnu.zip.zip or /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_um_8.02_i686-pc-linux-gnu.zip.ZIP.
unzip: cannot find or open hadsm4_data_8.02_i686-pc-linux-gnu.zip, hadsm4_data_8.02_i686-pc-linux-gnu.zip.zip or hadsm4_data_8.02_i686-pc-linux-gnu.zip.ZIP.
unzip: cannot find or open hadsm4_a10a_201310_6_911_012090511.zip, hadsm4_a10a_201310_6_911_012090511.zip.zip or hadsm4_a10a_201310_6_911_012090511.zip.ZIP.
cpdnmonitor: cannot open input file /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_se_8.02_i686-pc-linux-gnu.so after 11 attempts
cpdnmonitor: cannot open input file /mydisks/a/boinc_lib/projects/climateprediction.net/hadsm4_um_8.02_i686-pc-linux-gnu after 11 attempts

if it helps.
ID: 64413 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64414 - Posted: 28 Aug 2021, 3:19:06 UTC - in response to Message 64413.  

I do not have any idea what his file system is like... My files tend to look like this: They are running and have not failed yet.
/var/lib/boinc/projects/climateprediction.net/hadam4_um_8.09_i686-pc-linux-gnu
/var/lib/boinc/projects/climateprediction.net/hadam4_um_8.09_i686-pc-linux-gnu.zip
/var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu
/var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu.zip
/var/lib/boinc/slots/0/hadam4_um_8.09_i686-pc-linux-gnu.zip
/var/lib/boinc/slots/1/hadam4_um_8.52_i686-pc-linux-gnu.zip
/var/lib/boinc/slots/4/hadam4_um_8.52_i686-pc-linux-gnu.zip
/var/lib/boinc/slots/6/hadam4_um_8.09_i686-pc-linux-gnu.zip
/var/lib/boinc/slots/7/hadam4_um_8.52_i686-pc-linux-gnu.zip

ID: 64414 · Report as offensive     Reply Quote
mngn

Send message
Joined: 13 Jul 18
Posts: 38
Credit: 62,933,508
RAC: 84,702
Message 64431 - Posted: 7 Sep 2021, 13:53:57 UTC

All crashes.
https://www.cpdn.org/results.php?hostid=1506194
ID: 64431 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 64688 - Posted: 23 Oct 2021, 22:29:58 UTC - in response to Message 64410.  

Eric is still getting errors on his computer file system. I got another of his failures on repeat today.
ID: 64688 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64689 - Posted: 24 Oct 2021, 2:51:33 UTC - in response to Message 64431.  

The first two were missing 32-bit libraries.
I did not bother to look at the rest.
ID: 64689 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 64850 - Posted: 6 Dec 2021, 23:23:46 UTC

Still getting muliple failures from Eric's computer systems (file location errors) and also from Science United (similar type of problem).
ID: 64850 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,748,307
RAC: 7,546
Message 64971 - Posted: 15 Jan 2022, 17:50:03 UTC
Last modified: 15 Jan 2022, 18:02:28 UTC

This one https://www.cpdn.org/results.php?hostid=1510055 has crashed all ~12k WUs and is continuing to do so.

This one https://www.cpdn.org/results.php?hostid=829775 has been crashing all since 2020 had 67 valid before that

And this one https://www.cpdn.org/results.php?hostid=1517479 has crashed all ~ 10k WUs and is continuing to do so.

Can this reporting be automated somehow? The level of micromanagement CPDN requires and the reluctance of staff to adjust some basic things is becoming daunting.
ID: 64971 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64972 - Posted: 15 Jan 2022, 23:19:55 UTC

I'll send another email tomorrow, when the weekend is over.
ID: 64972 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 64973 - Posted: 15 Jan 2022, 23:39:13 UTC - in response to Message 64972.  
Last modified: 15 Jan 2022, 23:41:30 UTC

Another 2 apart from Eric's.

1515416 2.6k errors
1523499 2.2k errors

Isn't it possible to "blacklist" machines with too many errors or unfinished tasks?
ID: 64973 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,588,752
RAC: 9,078
Message 65247 - Posted: 8 Mar 2022, 15:01:54 UTC

These hosts have crashed over 1,000 cm3s.

546: https://www.cpdn.org/results.php?hostid=1368852
362: https://www.cpdn.org/results.php?hostid=1492772
93: https://www.cpdn.org/results.php?hostid=1489800
87: https://www.cpdn.org/results.php?hostid=1368870
ID: 65247 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 65263 - Posted: 11 Mar 2022, 4:47:47 UTC
Last modified: 11 Mar 2022, 4:54:30 UTC

MacOS, all crashes:
97: https://www.cpdn.org/results.php?hostid=1493719
112: https://www.cpdn.org/results.php?hostid=1368870
138: https://www.cpdn.org/results.php?hostid=1437487
134: https://www.cpdn.org/results.php?hostid=1433467
94: https://www.cpdn.org/results.php?hostid=1478457
189: https://www.cpdn.org/results.php?hostid=1441240

FreeBSD, nothing but errors:
https://www.cpdn.org/results.php?hostid=1523499 (a lot of Insufficient Memory/Stack Space Available!)
ID: 65263 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,416,193
RAC: 15,520
Message 65576 - Posted: 16 Jun 2022, 22:29:38 UTC

More crashers:

1532165 and 1477807 --Missing libraries

1524953 --odd errors

1517679 (Eric's) now up to 5889 failed tasks!!!
ID: 65576 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : climateprediction.net Science : Misconfigured Machine?

©2024 cpdn.org