Message boards : Cafe CPDN : World Community Grid mostly down for 2 months while transitioning
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
sigh WCG has been hard-down again, for some hard drive failure issue or another. https://twitter.com/WCGrid Before that, they were dealing with severe shortages of WUs, because the scripts tried to predict how much work would be done, based on past performance. Except, if past performance was low due to lack of WUs to send, it would predict low future performance and not generate many WUs. I'm doing some video transcode on my boxes right now, but once that's done, I suppose they go into shut down archive mode and I heat on resistors. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
sighThere are so many other useful things you can run. Assuming you mean CPUs, Rosetta just got 9 million tasks in. Sidock's got a load of longer tasks. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
There are so many other useful things you can run. Assuming you mean CPUs, Rosetta just got 9 million tasks in. Sidock's got a load of longer tasks. Yeah, all I have are CPUs. I just don't like running "math for the sake of doing compute" sort of projects like prime finders or such. Where are you seeing 9M Rosetta tasks? http://boinc.bakerlab.org/server_status.php argues a lot lower, though I suppose I can toss their stuff in my mix for when everything else is out. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Yeah, all I have are CPUs. I just don't like running "math for the sake of doing compute" sort of projects like prime finders or such.Agreed. But there's loads of biology and physics to do. If you can run VB stuff, go on LHC and discover subatomic particles. They're why we have MRI scanners. Where are you seeing 9M Rosetta tasks? http://boinc.bakerlab.org/server_status.php argues a lot lower, though I suppose I can toss their stuff in my mix for when everything else is out.Unlike most projects, they're honest and admit the real full queue. It's on the home page. Server status just shows the RAM buffer. No idea why projects choose to hide a key piece of information from their volunteers. |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 14,986,850 RAC: 9,927 |
sigh To say that WCG has struggled since leaving IBM would be an understatement. Some projects that have plenty of CPU work that I find interesting enough to run, and are not just pure math projects, are: Rosetta, Einstein, Universe, MilkyWay, LHC. Yes, like mentioned above, Rosetta is a bit unique in that on the home page they have a stats section that lists, among other things, total queued jobs. The traditional server status page just shows the buffer from which clients get tasks, they seem to keep it at around 30k. Einstein also has that kind of info on their server status page broken down by sub-project although that section sometimes disappears. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Rosetta just got 9 million tasks in. Rosetta has only this many at the moment. Tasks in progress 329539 Tasks ready to send 29905 |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Incorrect, on the main page:Rosetta just got 9 million tasks in.Rosetta has only this many at the moment. The server status page is a poor version of this. No idea why they keep it there at all. no idea why some projects don't have an honest queue displayed. What are they hiding? |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
The server status page is a poor version of this. No idea why they keep it there at all. no idea why some projects don't have an honest queue displayed. What are they hiding?Can't comment on other projects, especially those I have had little involvement with but CPDN only updates the server status page every couple of hours which means small batches of work can be gone before they show up on the page. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
This is common - the updating frequency is low on other projects too. Rosetta is odd, since the updating time of server status and the updating time of the main page are different, so one is usually ahead of the other, but not always the same one.The server status page is a poor version of this. No idea why they keep it there at all. no idea why some projects don't have an honest queue displayed. What are they hiding?Can't comment on other projects, especially those I have had little involvement with but CPDN only updates the server status page every couple of hours which means small batches of work can be gone before they show up on the page. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
This is common - the updating frequency is low on other projects too. Rosetta is odd, since the updating time of server status and the updating time of the main page are different, so one is usually ahead of the other, but not always the same one.The server status page is a poor version of this. No idea why they keep it there at all. no idea why some projects don't have an honest queue displayed. What are they hiding?Can't comment on other projects, especially those I have had little involvement with but CPDN only updates the server status page every couple of hours which means small batches of work can be gone before they show up on the page. The standard BOINC server status page can only see work that has been prepared for sending out, as it can only see the BOINC database. That's probably the most honest indication of available work for most projects, since the total amount of work may be indeterminate for one reason or another! (And you can't hide what you don't know...) There are some projects out there that have an exactly known number of work units (if all goes well) -- individual batches here at CPDN may be examples, ARP1 sub-project at WCG is another1, and I'm sure there are many more. However, more common seem to be projects such as MilkyWay (both sub-projects) or WCG projects such as MCM1, OPN1/OPNG and SCC12 that run until some target is hit3, whilst there may also be genuinely open-ended ones (lots of "mathematical" projects?) I'd actually be interested to know where Rosetta gets that very large number from (and its likely accuracy) -- it's almost certainly not coming from anything in the BOINC database itself, otherwise it could possibly update at the same time(s) as the server status page :-) Cheers - Al. P.S. I wonder if knowing how much work is available long-term is only of major interest to badge-hunters? The clamour at WCG when a project was known to be nearing its end used to be quite something to behold... :-) 1 The only argument about the number of work units was whether the year of data (at two days per "generation") would have 364 days or 366 -- a difference of 35609 work units! 2 In the case of SCC1 there has been such a long hiatus that many think it won't return, but childhood cancers are a key research area so I mention it anyway... 3 Those projects process data for a given "target" for as long as the scientists deem appropriate (e.g. MilkyWay runs a set of streams "until converged"), then they move on to another target. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
If you can run VB stuff, go on LHC and discover subatomic particles. They're why we have MRI scanners. I'll have to see, I don't have a huge amount of bandwidth and most of my compute machines are RAM starved if it's running a VM per core - I built for around 1.5GB/core, and that's been a limit, though I'm hitting RAM bandwidth/latency/cache limits anyway before I run out of RAM, based on overall system throughput measurements. That Rosetta seems to have useful work is good, I'll let some of that run. I think they got run out of WUs some while back during Covid, didn't they? |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I'd actually be interested to know where Rosetta gets that very large number from (and its likely accuracy) -- it's almost certainly not coming from anything in the BOINC database itself, otherwise it could possibly update at the same time(s) as the server status page :-)The scientists presumably create a huge batch of several million targets and put them in a folder somewhere on the server. The server keeps track of how many are left. The actual Boinc server presumably can't handle that many, so they're spoon fed to the useless POS software a few thousand at a time. P.S. I wonder if knowing how much work is available long-term is only of major interest to badge-hunters? The clamour at WCG when a project was known to be nearing its end used to be quite something to behold... :-)I don't hunt badges, I'm doing it for the good of mankind. But it's nice to know how much work is available at a project so I can choose which other ones to run. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I'll have to see, I don't have a huge amount of bandwidthLHC's CMS uses a lot of bandwidth. I have 122 CPU cores, but only 7Mbit upload. To run all that on LHC's CMS, I'd need three times the bandwidth, about 20Mbit. But the other subprojects there are not bandwidth hungry. and most of my compute machines are RAM starved if it's running a VM per core - I built for around 1.5GB/core, and that's been a limit, though I'm hitting RAM bandwidth/latency/cache limits anyway before I run out of RAM, based on overall system throughput measurements.LHC's ATLAS tasks can run on up to 8 cores. I think the formula is something like 2GB per task + 1GB per core. So you can run 10GB tasks on 8 cores, which should be within your amount. That Rosetta seems to have useful work is good, I'll let some of that run. I think they got run out of WUs some while back during Covid, didn't they?They had an enormous batch of python apps running, which not only required VB but also a modern CPU with AVX, so not many people were able to run them. The trouble was they tried to run on computers that couldn't run them, and there was a mess. It took a long conversation between me and several other users to determine which instruction was required in the CPU to run them without failing. The other problem was no admins in the forum to listen. No way to contact anyone at all. There is now one scientist in there though. Steven Rettie I think. And their currnet tasks are not VB, although they're still working on fixing the bugs in the VB one so it may be back in full force soon. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Well, those 9M Rosetta tasks have been disappearing at a rate of around 500k/day. Down to 1.5M left... not much more work to be found there, unless they've got new stuff coming. I suppose I'll slosh compute over to F@H and help run them out of tasks afterwards... or get off my rear and get some more video transcode work lined up for my boxes. I've got a bunch of the Google Compute Engine C3 beta machines (they're free while in beta!) chewing on R@H and F@H too right now. ;) Something like 44 high end Intel CPU cores... |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Can't see "beta", but I have found "free trial" with $400 credit. I've signed up and will attempt to use it. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I have no idea how to make this work. I tried to set up an instance, and was told after I'd filled everything in, I couldn't use Windows in the free trial. Fine, I made a new instance with Ubuntu, then it tells me (right at the end) the free trial is limited to 32 CPU cores. Fine, start again! 32 cores on Ubuntu. No, now I can only use 8?! So I ask on the Google Cloud Community, and it won't accept any username I choose. Keeps saying invalid characters even though I just use some lower case letters. Anyone any idea how to make it work at all? |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
I had a 24 core per region quota limit, though I've not tried to raise it. I was able to just create some 4/8/22 core VMs and they're purring away nicely. Rosetta@Home is out of tasks, though there are still some retries going around. I suppose next steps to run out of work would be Folding@Home CPU or maybe Einstein@Home, though I don't particularly find pulsars/neutron stars/etc particularly interesting projects, and certainly not of much "Earthly importance." |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
I had a 24 core per region quota limit, though I've not tried to raise it. I was able to just create some 4/8/22 core VMs and they're purring away nicely. The next project to run out of work will be TN-Grid in about 10 days time :-( Of my 5 projects that will be 4 of them without work. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I had a 24 core per region quota limit, though I've not tried to raise it. I was able to just create some 4/8/22 core VMs and they're purring away nicely.You could try Sidock, Denis, QuChempedia, World Community Grid, or Asteroids incase one hits us! |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
It seems subscriptions aren't being emailed out. Who's in charge of the email server? |
©2024 cpdn.org