Message boards : Number crunching : Tasks by application = hoarding
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I am always amazed to see four failures, and yet my machine has no trouble processing the work unit. Not so surprising when most of the failures are due to missing libraries. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I am always amazed to see four failures, and yet my machine has no trouble processing the work unit. I could make a statistically sloppy assumption and propose that 4/5 of the machines lack proper libraries. But it is so frequent. But CPDN has always used 32-bit libraries, so are there that many volunteers that never ever check their results? |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,083,753 RAC: 15,077 |
Missing libraries and/or wierd file systems that the BOINC client can't decifer. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
so are there that many volunteers that never ever check their results? Unfortunately, yes. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,083,753 RAC: 15,077 |
so are there that many volunteers that never ever check their results? And by inference their tasks. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
The big difference for me is, I have never seen the server running out of Linux Tasks. Give it a day or two at this rate? It's running out in a hurry. Clearly an awful lot of the Linux tasks are being chewed through, though I can't see how many of the WUs are actually completed successfully vs hitting all the failures from libraries or such (I can see that a lot of the stuff I'm working on has plenty of other failures, though they're not all library failure - some download errors, some weird filesystem errors, etc). I'm draining out some of my compute boxes as we head into winter (letting them finish CPDN tasks and not pull new ones), since I won't have the solar surplus to run the long tasks nearly as well, and I'll shift to WCG or such for heating (shorter tasks, and they handle machine shutdowns properly - suspend is fine in the summer, but draws enough more watts than shutdown in the winter to be annoying). I've got a perfectly good resistive heater, but it feels wrong to do that when I can be doing something useful with the power in the process anyway. But, in terms of credit statistics (which seems a better way to analyze work done): https://www.cpdn.org/top_users.php I turn in around 30k "recent average credit," and I'm returning about half a WU per day, ish. Depending on the day and sun and such, but in the range of 0.5 to 1 WU/day. On the leaders page, there are about 900k "recent average credit" points showing for the first 20 users, plus plenty of other users. So the front page alone, that's around 30 N216/day of compute, plus all the other users. That seems a bit low for how rapidly WUs are being chewed through (seems to be around 100/day?), but I don't know if "hoarding" is to blame so much as machines rapidly failing a lot of WUs. I just don't think I've got the ability to see how many WUs fail for "science" reasons (negative theta or pressure, etc), vs "compute" reasons (no libraries or such). |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The big difference for me is, I have never seen the server running out of Linux Tasks. My machine has only been up for a few days, but it does chew up PIDs at a great rate. But once a process has completed and been waited for, its entry can be deleted from the process table and its PID can be re-used. You really need a PID only for each process in the process table. In the old days, UNIX and Linux could execute a small number (a few hundred) processes at a time, and the process table was searched sequentially. Later this was increased to something like 32000 PIDs, and IIRC, they were hashed instead of sequentially searched. So now, with 64-bit machines, perhaps they hash them all the time.PID1 is the equivalent of the old init process that starts off all the others. Its name is systemd PID 2043 is started by systemd and is the BOINC client which starts all the other Boinc tasks. So, as you can see in the below list, most of the BOINC tasks were started by systemd PID 2043. The only difference is the CPDN tasks. Systemd starts the processes at the bottom of this list and then these processes start the actual worker processes, the ones at the top of the list. But my system is running normally and not doing much else other than running Boinic most of the time. top - 17:44:52 up 4 days, 10:20, 1 user, load average: 8.20, 8.39, 8.55 Tasks: 452 total, 9 running, 440 sleeping, 3 stopped, 0 zombie %Cpu(s): 0.2 us, 2.9 sy, 47.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st MiB Mem : 63902.3 total, 1470.5 free, 12065.6 used, 50366.2 buff/cache MiB Swap: 15992.0 total, 15977.2 free, 14.8 used. 50988.2 avail Mem PID PPID USER PR NI S RES SHR %MEM %CPU P TIME+ COMMAND 308902 308888 boinc 39 19 R 1.3g 19904 2.1 99.6 2 1443:11 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 331381 331356 boinc 39 19 T 1.3g 19748 2.1 0.0 7 975:29.32 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 312209 312164 boinc 39 19 T 1.3g 19940 2.1 0.0 7 944:49.84 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 371391 371381 boinc 39 19 T 1.3g 19940 2.1 0.0 5 319:35.24 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 417326 2043 boinc 39 19 R 866900 88936 1.3 99.3 7 159:57.54 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 412389 2043 boinc 39 19 R 785968 76036 1.2 99.6 5 221:45.13 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 411721 2043 boinc 39 19 R 781688 76036 1.2 99.5 6 231:42.70 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 416241 2043 boinc 39 19 R 759928 28688 1.2 99.5 3 175:16.58 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 391074 2043 boinc 39 19 R 715908 25232 1.1 99.8 8 513:27.05 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 425305 2043 boinc 39 19 R 107748 2676 0.2 99.5 1 41:11.11 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_a+ 427839 2043 boinc 39 19 R 68004 2456 0.1 99.6 0 7:01.12 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_m+ 2043 1 boinc 30 10 S 37208 17740 0.1 0.1 5 45382:39 /usr/bin/boinc 371381 2043 boinc 39 19 S 17844 17176 0.0 0.0 14 0:29.50 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 312164 2043 boinc 39 19 S 17764 17096 0.0 0.0 10 0:47.00 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 308888 2043 boinc 39 19 S 17668 17004 0.0 0.1 10 0:45.53 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 331356 2043 boinc 39 19 S 17668 17004 0.0 0.0 14 0:46.20 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
I was talking about the available tasks for download here... https://www.cpdn.org/server_status.php Nothing to do with running out of PIDs. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And they're all gone. There is one showing, but according to sources the mods have, it seems to be a ghost. I'll let the project know, and maybe the list can be "zeroed". |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And now there's 8 waiting, so it looks like they're failures. That makes it harder to see which ones they are. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,729,836 RAC: 7,099 |
so are there that many volunteers that never ever check their results? Looking at the 'Top participants' statistics page, user number #19 is Science United. Science United is an anonymising account manager, which appears to have been deliberately designed by David Anderson to disconnect volunteers from the science and technicalities of the projects they run. Science United users can read our forums here (as any member of the public can), but can't ask questions or seek help. They aren't given a password to log into the account. Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment. |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,403,841 RAC: 10,259 |
Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment.With a few minutes of spare time ... rough orders of magnitude: For the last 30 days ... 847 devices have made contact. The first 100 computers with a recent credit number, about 75% are M$ and 25% are Linux. Scanning down the list for credits, a few WAH returns from M$ devices and six Linux devices with more than 1,000 recent credit. These six Linux devices are trickling 13 tasks to credit with 11 other tasks in the wings. More than 90% of the 847 do not appear to be doing anything useful, unsurprisingly for the 75% M$ cohort. Since 2019 ... the user appears to have 4470 devices attached. Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Since 2019 ... the user appears to have 4470 devices attached. I think there is a case for deleting the SU account. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,729,836 RAC: 7,099 |
Unfortunately, it'll keep getting re-created like a bad penny.Since 2019 ... the user appears to have 4470 devices attached.I think there is a case for deleting the SU account. I think the only guaranteed solution is for BOINC project administrators - collectively - to speak truth to power. They'll have to decide whether the 'power' in question is David Anderson in person, the Regents of the University of California corporately, or both. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Unfortunately, it'll keep getting re-created like a bad penny.Since 2019 ... the user appears to have 4470 devices attached.I think there is a case for deleting the SU account. Power often just ignores truth and goes on doing as it pleases. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
Well, there is nothing left to hoard. I thought I would never see the day when Linux tasks would be zero. :) Congratulations. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Me either... Just processing through the last of the batches, I suppose. I've kind of built out some of my solar powered BOINC boxes for CPDN specifically - the eDRAM builds (huge L4 cache) are for optimizing the compute of these memory hungry beasties. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
Now there are no new work units since a few days, but I still get resends to kepp my VMs running. Lets see, when new work will become available |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
KAMasud - _____________ The problem is, I have according to the site data thirteen WU's in progress. Whereas I have five in progress. Eight WU's are missing(Internet black hole grabbed them). I will remove my computers from the project then re-add them, will it solve the problem? i7-8750H has three WU's while i710750 has two WU's. Linux in VB. Then I have three being shown as validation pending? Now Africa Rainfall Project, why is my event log saying there are GPU WU's available but they need my Intel GPU and not my regular Nvidia? Never mind, not a life and death situation. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Then I have three being shown as validation pending? There are a few BOINC options not used by this project; validation pending is one of them. So what's recorded against this option for each task is whatever was there when the task finished. So ignore it. |
©2024 cpdn.org