climateprediction.net (CPDN) home page
Posts by Conan

Posts by Conan

InfoMessage
1) Message boards : Number crunching : Tasks available, but I am not getting them.
Message 71166
Posted 2 Aug 2024 by ProfileConan
I didn't have libnsl.so.1 on my computer so I have now loaded it in case I need it later,

Conan
2) Message boards : Number crunching : Tasks available, but I am not getting them.
Message 71149
Posted 1 Aug 2024 by ProfileConan
Mine didn't, after the 20th trickle about when it was finishing it then failed on File Transfer

Unable to load library wah2_se_8.27_i686-pc-linux-gnu.so
dlopen error: libnsl.so.1: cannot open shared object file: No such file or directory

I must not have had any 32 bit libraries installed and so it could not find it
Have now installed the file it is complaining about, even though I probably wont need it as I wanted to just have 63 bit applications running.

Conan
3) Message boards : Number crunching : Batch 1017 Errors
Message 70977
Posted 13 Jun 2024 by ProfileConan
Sorry the last 7 work units failed, but not due to faulty work units.

I ran out of memory when another programme started up using 1 GB per work unit and launched 22 of them, normally not a problem but with 2 Climate Prediction WUs running using 3 to 5 GB each I had nothing left.

It took a while to get control of the computer back and then I aborted the other project and set to No New Work which should stop it from happening again.

Conan
4) Message boards : Number crunching : Batch 1017 Errors
Message 70976
Posted 13 Jun 2024 by ProfileConan
The resent tasks are now running correctly and I completed one successfully with a few more running.

Thanks
Conan
5) Message boards : Number crunching : Batch 1017 Errors
Message 70951
Posted 8 Jun 2024 by ProfileConan
Next 2 failed the same way

My hosts are visible so you can see the error messages
I am running Linux Fedora 37 on a Ryzen 8 7900x and a 5900x. the 5900 has not returned a result yet

Conan
6) Message boards : Number crunching : Batch 1017 Errors
Message 70949
Posted 8 Jun 2024 by ProfileConan
Great to get some work after a very long time.

However two completed work units show an error after the 2nd trickle has been uploaded.

I think this is after the 14th zip file

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>oifs_43r3_bl_a05v_2016092300_20_1017_12282038_0_r1427327128_15.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>oifs_43r3_bl_a05v_2016092300_20_1017_12282038_0_r1427327128_16.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>oifs_43r3_bl_a05v_2016092300_20_1017_12282038_0_r1427327128_17.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>oifs_43r3_bl_a05v_2016092300_20_1017_12282038_0_r1427327128_18.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>oifs_43r3_bl_a05v_2016092300_20_1017_12282038_0_r1427327128_19.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
</message>

Ran for almost 6 and half hours before failing.

Conan
7) Message boards : Number crunching : New work discussion - 2
Message 69557
Posted 2 Sep 2023 by ProfileConan
Until we get more experience with volunteers running these high memory apps I think it makes sense to restrict it to a single task for now. We can change it later in light of experience.

No other projects I know of run tasks with this high memory requirements so it's not obvious how they will be received. Let's walk first before we run with this.
LHC's ATLAS tasks at 10GB are the biggest I know of. But that's 8 threads, so you don't get people trying to run huge numbers of them. Are yours going to be single threads?


YOYO@home ECM/P2 tasks take at least 11 GB per task, single thread. Which is why I stopped running them on my 32 GB machine and limit them to just 3 at a time on my 64GB machine, they are real memory hogs.

Conan
8) Message boards : Number crunching : New work discussion - 2
Message 69537
Posted 28 Aug 2023 by ProfileConan
Any new work for 64 bit coming along? I noticed a couple of new entries on the server status page

OpenIFS 43r3
OpenIFS 43r3 Baroclinic Lifecycle
OpenIFS 43r3 Perturbed Surface
OpenIFS 43r3 Cubic Octahedral grid tco95 l91
OpenIFS 43r3 Linear grid tl255 l91


Thanks
Conan
9) Message boards : Number crunching : New work discussion - 2
Message 68914
Posted 18 Jun 2023 by ProfileConan
Although not related to new work but following on from the last couple of posts,
CMDock uses a wrapper and it shows under Linux,
I believe that YAFU also uses a wrapper and possibly YOYO, SRBase, TNGrid? and a few others. In some cases it is needed due to the type of programme being used or the code it has been written in.

A few other projects also use a "Trickle up" method to keep the Server updated with progress (Primegrid is one) and some of these projects need a wrapper for this purpose.

Conan
10) Message boards : Number crunching : Server Status page questions
Message 68604
Posted 19 Mar 2023 by ProfileConan
I have also wondered about the server page.

UK Met Office Coupled Model Full Resolution Ocean has had 927 tasks "in progress" for many months but I have seen no indication that any have been returned and the number never changes.

Weather At Home 2 (wah2) (region independent) has 4,731 tasks in progress again for many months and again I have not seen any activity with this either (maybe 1 came back 4 months ago but can't be sure).

What is happening with these work units?

Conan
11) Message boards : Number crunching : Upload server is out of disk space
Message 67724
Posted 14 Jan 2023 by ProfileConan
Hi Kali,

The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions.

Dave


Actually Dave, Hobart is in Tasmania, Australia. Not NZ (New Zealand).

Conan
12) Message boards : Number crunching : The uploads are stuck
Message 67538
Posted 11 Jan 2023 by ProfileConan
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan


It has changed to "transient HTTP error" now so still not working here yet (Australia).

Server Status has not changed yet, still showing nothing.

Conan

PS: Some files are now moving, so possibly due to the load, some fail then must retry later, others are going through, some as low as 17 kB/s to as high as 1,700 kB/s.
13) Message boards : Number crunching : The uploads are stuck
Message 67525
Posted 10 Jan 2023 by ProfileConan
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan
14) Message boards : Number crunching : Tasks failing on Ubuntu 22
Message 67347
Posted 5 Jan 2023 by ProfileConan
If you changed the option to "leave tasks in memory" but did not read the file to update BOINC with the change it may not work until it is read.
Restarting BOINC would also read the file.

Conan
15) Message boards : Number crunching : Hardware for new models.
Message 67296
Posted 4 Jan 2023 by ProfileConan
I saw some test results with the AMD RYZEN 5950X, RYZEN 7950X, INTEL 12900 and INTEL 13900 (I think they were the model names).

When all under full load for what ever test they were doing

RYZEN 9 5950X used 130 Watts
RYZEN 9 7950X used 270 Watts (or there abouts)
INTEL 12900 used 285-290 Watts (or there abouts)
INTEL 13900 used 315 Watts (or there abouts)

Can't point you to the tests but they were on Youtube along with other showing similar results.

So the RYZEN 5950X may not be as powerful as the new models but for energy efficiency hard to beat.

That's of course if you can find them, they are getting harder to find.

I run a RYZEN 9 5900X which has 12 cores + 12 threads which should use even less power as it has less cores than the 5950X.
It has 64 GB of RAM and along with a full compliment of other BOINC projects easily runs 9 CPDN work units at a time. Only gets to about 42 GB max depending what I am running at the time (everything not just CPDN) (it may get higher than 42 GB but I have the head room to cover that.)

BOINC has not downloaded more than 9 work units at any one time, probably because I am running a lot of other projects at the same time.

Conan
16) Message boards : Number crunching : OpenIFS Discussion
Message 66999
Posted 22 Dec 2022 by ProfileConan
All 9 work units that I had running overnight have completed successfully.

Running on an AMD Ryzen 9 5900x, 64GB RAM, all 24 threads used to run BOINC programmes at the same time as the ClimatePrediction models.
All took around 17 hours 10 minutes run time.

Conan
17) Message boards : Number crunching : Late Validation pending
Message 66991
Posted 21 Dec 2022 by ProfileConan
Well it seems that these files have finally been validated and I have been awarded credit for them, I think.

I have noticed a clean up/out has taken place and a lot of the old past work units that I have done over the years has been removed.
Those 2 pending jobs among them. I was awarded some small amount of credit this week when I have not done any work and now it seems that the database has had a bit of a clean out and fix up. Good to see.

Conan
18) Message boards : Number crunching : OpenIFS Discussion
Message 66990
Posted 21 Dec 2022 by ProfileConan
G'Day Glenn,

You may of miss read what I wrote I think.

The 11.3 GB was not a file size but the amount of disk writes made in that first 2 hours (now after 5 hours well over 30 Gb).
The 2.7 to 4.6 GB were RAM amounts that each work unit was using.

This was all taken from System Monitor.

I did what you have asked and

% cd slots/26
% du -hs . # note the '.'
1.2G .

This is the same as your example.

% cd projects/climateprediction.net
% du -hs .
1.2G .

This is similar to your example.

du -hs srf*

768 MB srf00370000.0001

So all running fine, so maybe just a bit of a misunderstanding I think with data amounts and RAM usage.

Thanks
Conan
19) Message boards : Number crunching : OpenIFS Discussion
Message 66983
Posted 21 Dec 2022 by ProfileConan
These Oifs _ps tasks really test your system out.

Running 9 at once, each using from 2.7 to 4.2 GB of RAM, after 2 hours run time they have written 11.3 GB of data to disk each (101.7 GB), which is huge.
Hitting 50 GB of RAM in use out of 64 GB, but I am also running LODA tasks which each use 1 GB of RAM. All 24 threads are running.
12% in and running fine so far.

Conan
20) Message boards : Number crunching : OpenIFS Discussion
Message 66795
Posted 6 Dec 2022 by ProfileConan
My resent task 22249228 has been sent out twice before.

Previous Task 22246540 and Task 22248943

Task 22246540 has no Stderr, it failed with a Run Time of 1 Day 5 Hours and a CPU Time of 31 Minutes. It also had an unusual amount of Peak Disk Usage of 23,961.87 MB (or 23.9 GB) way above the norm as I have seen.

Task 22248943 has the error "Process exited with code 9" other than that seemed to have run fine. This one belonged to wateroakley

I was able to run this WU to completion without error.


Another resent task I have running is Task 22249324

Previous Task 22247025 and Task 22249194

Task 22247025 on computer 1524992 it had a Run Time of 42 Minutes with a CPU Time of 20 Seconds with a Peak Disk Usage of just 404.06 MB.
This computer still has work on it but has not completed a successful OpenIFS WU all failed work units have the same long run times and short CPU times and have different error codes as well, codes 1, 5 and 148 all appear on this computer.

Task 22249194 on computer 1504810 has No Stderr, has a Run Time of 1 Day 1 Hour and CPU Time of 7 Hours.
This computer has run 9 OpenIFS work units all have failed with the long Run Time and short CPU Time.
This computer belongs to happywetter.at

So a few different reasons that some work units have failed or thrown an error.

Conan

I completed Task 22249324 successfully in just under 17 1/2 hours.
Next 20

©2024 cpdn.org