climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 91 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65514 - Posted: 8 Jun 2022, 21:50:15 UTC - in response to Message 65502.  

It took me very little time to get all the files for the one work unit I just got.
I have a 75 Megabit/sec Internet connection with Verizon FiOS.
Wed 08 Jun 2022 05:32:05 PM EDT | climateprediction.net | Sending scheduler request: Requested by user.
Wed 08 Jun 2022 05:32:06 PM EDT | climateprediction.net | Scheduler request completed: got 1 new tasks
Wed 08 Jun 2022 05:32:06 PM EDT | climateprediction.net | Project requested delay of 3636 seconds
Wed 08 Jun 2022 05:32:08 PM EDT | climateprediction.net | Started download of hadam4h_a1qd_200011_5_932_012143257.zip
Wed 08 Jun 2022 05:32:08 PM EDT | climateprediction.net | Started download of a1qd_932_atmos.gz
Wed 08 Jun 2022 05:32:10 PM EDT | climateprediction.net | Finished download of hadam4h_a1qd_200011_5_932_012143257.zip
Wed 08 Jun 2022 05:32:10 PM EDT | climateprediction.net | Started download of ic_N216_2003_11_000057_f.nc.gz
Wed 08 Jun 2022 05:32:15 PM EDT | climateprediction.net | Finished download of ic_N216_2003_11_000057_f.nc.gz
Wed 08 Jun 2022 05:32:15 PM EDT | climateprediction.net | Started download of ancil_PAMIP_tos_fut2CArctic_N216-clim.gz
Wed 08 Jun 2022 05:32:18 PM EDT | climateprediction.net | Finished download of ancil_PAMIP_tos_fut2CArctic_N216-clim.gz
Wed 08 Jun 2022 05:32:18 PM EDT | climateprediction.net | Started download of ancil_PAMIP_siconc_fut2CArctic_N216-clim.gz
Wed 08 Jun 2022 05:32:20 PM EDT | climateprediction.net | Finished download of ancil_PAMIP_siconc_fut2CArctic_N216-clim.gz
Wed 08 Jun 2022 05:32:20 PM EDT | climateprediction.net | Started download of so2dms_rcp45_N216_1999_2010.gz
Wed 08 Jun 2022 05:32:48 PM EDT | climateprediction.net | Finished download of so2dms_rcp45_N216_1999_2010.gz
Wed 08 Jun 2022 05:32:48 PM EDT | climateprediction.net | Started download of oxi.addfa.N216L38.gz
Wed 08 Jun 2022 05:32:49 PM EDT | climateprediction.net | Finished download of a1qd_932_atmos.gz
Wed 08 Jun 2022 05:32:49 PM EDT | climateprediction.net | Started download of ozone_rcp45_N216L38_1999_2010v2.gz
Wed 08 Jun 2022 05:32:53 PM EDT | climateprediction.net | Finished download of ozone_rcp45_N216L38_1999_2010v2.gz
Wed 08 Jun 2022 05:32:53 PM EDT | climateprediction.net | Started download of VOLC38_LR_N216.gz
Wed 08 Jun 2022 05:32:54 PM EDT | climateprediction.net | Finished download of VOLC38_LR_N216.gz
Wed 08 Jun 2022 05:33:15 PM EDT | climateprediction.net | Finished download of oxi.addfa.N216L38.gz
Wed 08 Jun 2022 05:38:29 PM EDT | climateprediction.net | Starting task hadam4h_a1qd_200011_5_932_012143257_0

It has now been running a little over 8 minutes and has not crashed.
ID: 65514 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 65515 - Posted: 8 Jun 2022, 23:00:59 UTC
Last modified: 8 Jun 2022, 23:37:48 UTC

Yeah, it's better now...

Yesterday, when all the WUs hit, and all the starving Linux servers were fetching everything they could, it was pretty bad.

I think everyone's fed now, to within the bounds of the initial slug of work. And based on how long some people keep returning credit, probably plenty of extra days of WUs. Though if you're going to do that, benchmark first - otherwise it thinks the machine will take 100+ days to get WUs done on the default performance, and you won't get more than a full CPU set.

I've got my pair of 3900Xs up and running and loaded down, with my eDRAM 5775C finishing out some other stuff before it chews into the new stuff (I had it loaded down with Einstein WUs).

IIRC, a couple other people here have similar 3000 series Ryzens - I've handwavingly limited to 12 tasks, under the logic that even with 64MB of L3, they're still pretty cache hungry. But I don't have a good way to tell, on AMD, if I'm managing more throughout with 24 (at least on the machine with enough RAM for it). On Intel, the there's a performance counter utility that lets me monitor total instructions retired per unit time, which is throughput, so I can tune for that. But I've not found a good way to measure that on AMD/Linux yet.

//EDIT: Oh hey. Modprobe msr, have msr-tools installed, then 'rdmsr -a -x 0xC00000E9' should get you the retired instruction counts per core. I'll write a little wrapper for it here and I can benchmark throughput.
ID: 65515 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65516 - Posted: 8 Jun 2022, 23:52:44 UTC - in response to Message 65515.  

I think everyone's fed now, to within the bounds of the initial slug of work. And based on how long some people keep returning credit, probably plenty of extra days of WUs. Though if you're going to do that, benchmark first - otherwise it thinks the machine will take 100+ days to get WUs done on the default performance, and you won't get more than a full CPU set.


I got only one task, but I did not benchmark first. I last booted my machine about 8 days ago, so I surely did one then, but OTOH, no ClimatePrediction tasks were running.

My one task I got estimated a little over 8 days to run, which is what it normally takes.

I am now up to 2 hours executed and 8 days 3 hours estimated time to go.

I just ran the benchmarks and the estimated times to completion did not change, but maybe that is looked at that only once, when the process first begins.
ID: 65516 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 65517 - Posted: 9 Jun 2022, 0:05:08 UTC

That's about right. My new builds are showing "100 days" to completion. Far as I can tell, "performance stats" only get processed into estimated time at the task start - it doesn't update once they've been created. Not a problem, it doesn't impact actual performance, only estimates.
ID: 65517 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 65518 - Posted: 9 Jun 2022, 1:03:24 UTC

Mine are estimating just under 12 days each.
My estimate is: Who cares, I've got some more work at last. :)
ID: 65518 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 65519 - Posted: 9 Jun 2022, 5:21:47 UTC

And some good news.

@dave18874629 as @suzannerosier1 says we hope to have some windows work released soon for the new NZ25 region and we also have work in the pipeline to develop an EAS25 region that will also have work coming.

However I am willing to bet that the servers get emptied pretty quickly once these appear.
ID: 65519 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 65520 - Posted: 9 Jun 2022, 5:24:48 UTC

Aw yeah. Make the machines work for their pay! :D

I got my AMD instruction counter working, will clean it up a bit and stick it on Github soon. I'd like to figure out how to get DRAM read/write bandwidth while I'm in here...
ID: 65520 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65523 - Posted: 9 Jun 2022, 10:15:11 UTC - in response to Message 65511.  

They haven't updated the 32bit windows Boinc for 3.5+ years and the 32bit linux version for 7.5+ years. I guess they think if you are running older operating systems, you can download older versions of boinc and make do. The problem came about last fall due to the certificates being installed with older version of Windows boinc expiring. They came out with a new 64 bit version with a newer certificates list, but no new 32 bit version. They do have a certificates file to download at the top of this page: https://boinc.berkeley.edu/download_all.php if one is still running the older Windows versions.
I remember where I got this from now, I got the impression Boinc is (at least partially) still 32 bit because it fails to detect a GPU is over 4GB. The 64 bit projects that use the GPU can see all the VRAM though.
ID: 65523 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65524 - Posted: 9 Jun 2022, 10:18:14 UTC - in response to Message 65519.  
Last modified: 9 Jun 2022, 10:24:41 UTC

And some good news.

@dave18874629 as @suzannerosier1 says we hope to have some windows work released soon for the new NZ25 region and we also have work in the pipeline to develop an EAS25 region that will also have work coming.

However I am willing to bet that the servers get emptied pretty quickly once these appear.
Cool! I have 90 CPU cores waiting here, they're all checking for work, but I don't know how often, I think it's once an hour unless they're busy with other projects, which is always. I have a small buffer though. [Engaging tickle mechanism]
ID: 65524 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 65525 - Posted: 9 Jun 2022, 10:51:48 UTC

And some good news.
And the bad news is that the three testing tasks have completed but the zips are not uploading. I have informed the project but still waiting for a reply. Most likely Andy just needs to kick something to restart it. Unlikely to delay the as yet unknown date of the main site Windows tasks by more than a day or two at the most.
ID: 65525 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65526 - Posted: 9 Jun 2022, 11:27:44 UTC

Do you know the rough date of Windows tasks? Are we talking a week or a month?

Also, is the server status page accurate? It claims there are only 4 Linux users doing 4186 tasks!
ID: 65526 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65527 - Posted: 9 Jun 2022, 12:32:31 UTC - in response to Message 65526.  

is the server status page accurate? It claims there are only 4 Linux users doing 4186 tasks!


That means only 4 Linux users have returned results in the last 24 hours (or something like that); it is not the number of (Linux) users working on work units.
My machine at the moment is working on three ClimatePrediction work units at the moment, 2 MilkyWay work units (one of which uses 4 processor cores),
ID: 65527 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65528 - Posted: 9 Jun 2022, 12:59:07 UTC - in response to Message 65527.  

But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups?
ID: 65528 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,708,278
RAC: 9,361
Message 65529 - Posted: 9 Jun 2022, 13:02:08 UTC - in response to Message 65523.  

I remember where I got this from now, I got the impression Boinc is (at least partially) still 32 bit because it fails to detect a GPU is over 4GB. The 64 bit projects that use the GPU can see all the VRAM though.
That bit, at least, has been corrected in the forthcoming v7.20 release (preview available for testing - see BOINC message board).

There were earlier problems associated with the 64-bit SSL security libraries failing to run on some low-power Intel processors - that was corrected two or three years ago. I wonder if the WINE error could have been related to that one?
ID: 65529 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,708,278
RAC: 9,361
Message 65530 - Posted: 9 Jun 2022, 13:06:11 UTC - in response to Message 65525.  

... but the zips are not uploading. I have informed the project but still waiting for a reply. Most likely Andy just needs to kick something to restart it. Unlikely to delay the as yet unknown date of the main site Windows tasks by more than a day or two at the most.
Previous practice was to upload the zips to a server located in the region under investigation - NZ, in this case. The kick might be better directed there...

Have you checked <http_debug> log flags to see what the exact problem is?
ID: 65530 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 65531 - Posted: 9 Jun 2022, 14:04:21 UTC - in response to Message 65528.  
Last modified: 9 Jun 2022, 14:24:02 UTC

But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups?


Not 100% sure but I am fairly certain it is only the number who have reported completed tasks. That is based on the dev site which uses the same server code and the numbers are low enough that I can be sure trickle ups and zips don't count in that number.

Edit: so none of the two most recent batches which will take about 9 days on my Ryzen will be near showing yet on that metric.
ID: 65531 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 65532 - Posted: 9 Jun 2022, 14:23:03 UTC
Last modified: 9 Jun 2022, 17:44:46 UTC

Previous practice was to upload the zips to a server located in the region under investigation - NZ, in this case. The kick might be better directed there...
Usually Andy then has to contact them to tell them it needs kicking.

Have you checked <http_debug> log flags to see what the exact problem is?


upload4 I posted on Trello board for batch a long extract from event log with http_debug, http_transfer_debug and file_xfer_debug all checked. Though Andy probably won't see that unless Sarah or the batch owner point him to it.

Edit: Have short circuited that bit and emailed Andy.

Edit3: Right now 05:45 in NZ. Might have to wait a bit before the machine gets kicked.

Edit2: Andy has emailed sysadmin for the relevant machine.
ID: 65532 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 65533 - Posted: 9 Jun 2022, 14:26:57 UTC

Do you know the rough date of Windows tasks? Are we talking a week or a month?


Nothing more than the, "soon" I posted below (or above if you don't have posts sorted by, "most recent first.")
ID: 65533 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65535 - Posted: 9 Jun 2022, 15:13:29 UTC - in response to Message 65531.  

But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups?

Not 100% sure but I am fairly certain it is only the number who have reported completed tasks. That is based on the dev site which uses the same server code and the numbers are low enough that I can be sure trickle ups and zips don't count in that number.


I now have three N216 work units running. IIRC, they trickle every 1/8 of the time through. This would mean that my first trickle should come about 24 hours after it started. In this case, the oldest of my three work units is about 16 hours in and is about 9% complete. It has been so long that I do not remember if this counts on the server page.
ID: 65535 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 65536 - Posted: 9 Jun 2022, 23:23:55 UTC
Last modified: 9 Jun 2022, 23:31:28 UTC

Ugh... sorry. Dropped half a dozen WUs, they got OOM reaped when I screwed up some settings trying to optimize compute on a system. I'm just going to stop touching stuff...
ID: 65536 · Report as offensive
Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org