Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It took me very little time to get all the files for the one work unit I just got. I have a 75 Megabit/sec Internet connection with Verizon FiOS. Wed 08 Jun 2022 05:32:05 PM EDT | climateprediction.net | Sending scheduler request: Requested by user. Wed 08 Jun 2022 05:32:06 PM EDT | climateprediction.net | Scheduler request completed: got 1 new tasks Wed 08 Jun 2022 05:32:06 PM EDT | climateprediction.net | Project requested delay of 3636 seconds Wed 08 Jun 2022 05:32:08 PM EDT | climateprediction.net | Started download of hadam4h_a1qd_200011_5_932_012143257.zip Wed 08 Jun 2022 05:32:08 PM EDT | climateprediction.net | Started download of a1qd_932_atmos.gz Wed 08 Jun 2022 05:32:10 PM EDT | climateprediction.net | Finished download of hadam4h_a1qd_200011_5_932_012143257.zip Wed 08 Jun 2022 05:32:10 PM EDT | climateprediction.net | Started download of ic_N216_2003_11_000057_f.nc.gz Wed 08 Jun 2022 05:32:15 PM EDT | climateprediction.net | Finished download of ic_N216_2003_11_000057_f.nc.gz Wed 08 Jun 2022 05:32:15 PM EDT | climateprediction.net | Started download of ancil_PAMIP_tos_fut2CArctic_N216-clim.gz Wed 08 Jun 2022 05:32:18 PM EDT | climateprediction.net | Finished download of ancil_PAMIP_tos_fut2CArctic_N216-clim.gz Wed 08 Jun 2022 05:32:18 PM EDT | climateprediction.net | Started download of ancil_PAMIP_siconc_fut2CArctic_N216-clim.gz Wed 08 Jun 2022 05:32:20 PM EDT | climateprediction.net | Finished download of ancil_PAMIP_siconc_fut2CArctic_N216-clim.gz Wed 08 Jun 2022 05:32:20 PM EDT | climateprediction.net | Started download of so2dms_rcp45_N216_1999_2010.gz Wed 08 Jun 2022 05:32:48 PM EDT | climateprediction.net | Finished download of so2dms_rcp45_N216_1999_2010.gz Wed 08 Jun 2022 05:32:48 PM EDT | climateprediction.net | Started download of oxi.addfa.N216L38.gz Wed 08 Jun 2022 05:32:49 PM EDT | climateprediction.net | Finished download of a1qd_932_atmos.gz Wed 08 Jun 2022 05:32:49 PM EDT | climateprediction.net | Started download of ozone_rcp45_N216L38_1999_2010v2.gz Wed 08 Jun 2022 05:32:53 PM EDT | climateprediction.net | Finished download of ozone_rcp45_N216L38_1999_2010v2.gz Wed 08 Jun 2022 05:32:53 PM EDT | climateprediction.net | Started download of VOLC38_LR_N216.gz Wed 08 Jun 2022 05:32:54 PM EDT | climateprediction.net | Finished download of VOLC38_LR_N216.gz Wed 08 Jun 2022 05:33:15 PM EDT | climateprediction.net | Finished download of oxi.addfa.N216L38.gz Wed 08 Jun 2022 05:38:29 PM EDT | climateprediction.net | Starting task hadam4h_a1qd_200011_5_932_012143257_0 It has now been running a little over 8 minutes and has not crashed. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Yeah, it's better now... Yesterday, when all the WUs hit, and all the starving Linux servers were fetching everything they could, it was pretty bad. I think everyone's fed now, to within the bounds of the initial slug of work. And based on how long some people keep returning credit, probably plenty of extra days of WUs. Though if you're going to do that, benchmark first - otherwise it thinks the machine will take 100+ days to get WUs done on the default performance, and you won't get more than a full CPU set. I've got my pair of 3900Xs up and running and loaded down, with my eDRAM 5775C finishing out some other stuff before it chews into the new stuff (I had it loaded down with Einstein WUs). IIRC, a couple other people here have similar 3000 series Ryzens - I've handwavingly limited to 12 tasks, under the logic that even with 64MB of L3, they're still pretty cache hungry. But I don't have a good way to tell, on AMD, if I'm managing more throughout with 24 (at least on the machine with enough RAM for it). On Intel, the there's a performance counter utility that lets me monitor total instructions retired per unit time, which is throughput, so I can tune for that. But I've not found a good way to measure that on AMD/Linux yet. //EDIT: Oh hey. Modprobe msr, have msr-tools installed, then 'rdmsr -a -x 0xC00000E9' should get you the retired instruction counts per core. I'll write a little wrapper for it here and I can benchmark throughput. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I think everyone's fed now, to within the bounds of the initial slug of work. And based on how long some people keep returning credit, probably plenty of extra days of WUs. Though if you're going to do that, benchmark first - otherwise it thinks the machine will take 100+ days to get WUs done on the default performance, and you won't get more than a full CPU set. I got only one task, but I did not benchmark first. I last booted my machine about 8 days ago, so I surely did one then, but OTOH, no ClimatePrediction tasks were running. My one task I got estimated a little over 8 days to run, which is what it normally takes. I am now up to 2 hours executed and 8 days 3 hours estimated time to go. I just ran the benchmarks and the estimated times to completion did not change, but maybe that is looked at that only once, when the process first begins. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
That's about right. My new builds are showing "100 days" to completion. Far as I can tell, "performance stats" only get processed into estimated time at the task start - it doesn't update once they've been created. Not a problem, it doesn't impact actual performance, only estimates. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mine are estimating just under 12 days each. My estimate is: Who cares, I've got some more work at last. :) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
And some good news. @dave18874629 as @suzannerosier1 says we hope to have some windows work released soon for the new NZ25 region and we also have work in the pipeline to develop an EAS25 region that will also have work coming. However I am willing to bet that the servers get emptied pretty quickly once these appear. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Aw yeah. Make the machines work for their pay! :D I got my AMD instruction counter working, will clean it up a bit and stick it on Github soon. I'd like to figure out how to get DRAM read/write bandwidth while I'm in here... |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
They haven't updated the 32bit windows Boinc for 3.5+ years and the 32bit linux version for 7.5+ years. I guess they think if you are running older operating systems, you can download older versions of boinc and make do. The problem came about last fall due to the certificates being installed with older version of Windows boinc expiring. They came out with a new 64 bit version with a newer certificates list, but no new 32 bit version. They do have a certificates file to download at the top of this page: https://boinc.berkeley.edu/download_all.php if one is still running the older Windows versions.I remember where I got this from now, I got the impression Boinc is (at least partially) still 32 bit because it fails to detect a GPU is over 4GB. The 64 bit projects that use the GPU can see all the VRAM though. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
And some good news.Cool! I have 90 CPU cores waiting here, they're all checking for work, but I don't know how often, I think it's once an hour unless they're busy with other projects, which is always. I have a small buffer though. [Engaging tickle mechanism] |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
And some good news.And the bad news is that the three testing tasks have completed but the zips are not uploading. I have informed the project but still waiting for a reply. Most likely Andy just needs to kick something to restart it. Unlikely to delay the as yet unknown date of the main site Windows tasks by more than a day or two at the most. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Do you know the rough date of Windows tasks? Are we talking a week or a month? Also, is the server status page accurate? It claims there are only 4 Linux users doing 4186 tasks! |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
is the server status page accurate? It claims there are only 4 Linux users doing 4186 tasks! That means only 4 Linux users have returned results in the last 24 hours (or something like that); it is not the number of (Linux) users working on work units. My machine at the moment is working on three ClimatePrediction work units at the moment, 2 MilkyWay work units (one of which uses 4 processor cores), |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,722,381 RAC: 7,664 |
I remember where I got this from now, I got the impression Boinc is (at least partially) still 32 bit because it fails to detect a GPU is over 4GB. The 64 bit projects that use the GPU can see all the VRAM though.That bit, at least, has been corrected in the forthcoming v7.20 release (preview available for testing - see BOINC message board). There were earlier problems associated with the 64-bit SSL security libraries failing to run on some low-power Intel processors - that was corrected two or three years ago. I wonder if the WINE error could have been related to that one? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,722,381 RAC: 7,664 |
... but the zips are not uploading. I have informed the project but still waiting for a reply. Most likely Andy just needs to kick something to restart it. Unlikely to delay the as yet unknown date of the main site Windows tasks by more than a day or two at the most.Previous practice was to upload the zips to a server located in the region under investigation - NZ, in this case. The kick might be better directed there... Have you checked <http_debug> log flags to see what the exact problem is? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups? Not 100% sure but I am fairly certain it is only the number who have reported completed tasks. That is based on the dev site which uses the same server code and the numbers are low enough that I can be sure trickle ups and zips don't count in that number. Edit: so none of the two most recent batches which will take about 9 days on my Ryzen will be near showing yet on that metric. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Previous practice was to upload the zips to a server located in the region under investigation - NZ, in this case. The kick might be better directed there...Usually Andy then has to contact them to tell them it needs kicking. Have you checked <http_debug> log flags to see what the exact problem is? upload4 I posted on Trello board for batch a long extract from event log with http_debug, http_transfer_debug and file_xfer_debug all checked. Though Andy probably won't see that unless Sarah or the batch owner point him to it. Edit: Have short circuited that bit and emailed Andy. Edit3: Right now 05:45 in NZ. Might have to wait a bit before the machine gets kicked. Edit2: Andy has emailed sysadmin for the relevant machine. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Do you know the rough date of Windows tasks? Are we talking a week or a month? Nothing more than the, "soon" I posted below (or above if you don't have posts sorted by, "most recent first.") |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
But surely if there are 4186 in progress, more than 4 of you have contacted the server in the last day? Does it not include downloading and trickle ups? I now have three N216 work units running. IIRC, they trickle every 1/8 of the time through. This would mean that my first trickle should come about 24 hours after it started. In this case, the oldest of my three work units is about 16 hours in and is about 9% complete. It has been so long that I do not remember if this counts on the server page. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Ugh... sorry. Dropped half a dozen WUs, they got OOM reaped when I screwed up some settings trying to optimize compute on a system. I'm just going to stop touching stuff... |
©2024 cpdn.org