Thread 'Tasks by application = hoarding'

Author	Message
Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64471 - Posted: 16 Sep 2021, 13:49:15 UTC I am always amazed to see four failures, and yet my machine has no trouble processing the work unit. Not so surprising when most of the failures are due to missing libraries. ID: 64471 · Reply Quote

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64472 - Posted: 16 Sep 2021, 20:36:30 UTC - in response to Message 64471. I am always amazed to see four failures, and yet my machine has no trouble processing the work unit. Not so surprising when most of the failures are due to missing libraries. I could make a statistically sloppy assumption and propose that 4/5 of the machines lack proper libraries. But it is so frequent. But CPDN has always used 32-bit libraries, so are there that many volunteers that never ever check their results? ID: 64472 · Reply Quote

Alan K Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,033,903 RAC: 14,766	Message 64473 - Posted: 16 Sep 2021, 22:59:29 UTC - in response to Message 64471. Missing libraries and/or wierd file systems that the BOINC client can't decifer. ID: 64473 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64474 - Posted: 17 Sep 2021, 2:06:16 UTC - in response to Message 64472. so are there that many volunteers that never ever check their results? Unfortunately, yes. ID: 64474 · Reply Quote

Alan K Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,033,903 RAC: 14,766	Message 64476 - Posted: 17 Sep 2021, 23:16:07 UTC - in response to Message 64474. so are there that many volunteers that never ever check their results? Unfortunately, yes. And by inference their tasks. ID: 64476 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 64510 - Posted: 28 Sep 2021, 16:36:08 UTC - in response to Message 64448. The big difference for me is, I have never seen the server running out of Linux Tasks. Give it a day or two at this rate? It's running out in a hurry. Clearly an awful lot of the Linux tasks are being chewed through, though I can't see how many of the WUs are actually completed successfully vs hitting all the failures from libraries or such (I can see that a lot of the stuff I'm working on has plenty of other failures, though they're not all library failure - some download errors, some weird filesystem errors, etc). I'm draining out some of my compute boxes as we head into winter (letting them finish CPDN tasks and not pull new ones), since I won't have the solar surplus to run the long tasks nearly as well, and I'll shift to WCG or such for heating (shorter tasks, and they handle machine shutdowns properly - suspend is fine in the summer, but draws enough more watts than shutdown in the winter to be annoying). I've got a perfectly good resistive heater, but it feels wrong to do that when I can be doing something useful with the power in the process anyway. But, in terms of credit statistics (which seems a better way to analyze work done): https://www.cpdn.org/top_users.php I turn in around 30k "recent average credit," and I'm returning about half a WU per day, ish. Depending on the day and sun and such, but in the range of 0.5 to 1 WU/day. On the leaders page, there are about 900k "recent average credit" points showing for the first 20 users, plus plenty of other users. So the front page alone, that's around 30 N216/day of compute, plus all the other users. That seems a bit low for how rapidly WUs are being chewed through (seems to be around 100/day?), but I don't know if "hoarding" is to blame so much as machines rapidly failing a lot of WUs. I just don't think I've got the ability to see how many WUs fail for "science" reasons (negative theta or pressure, etc), vs "compute" reasons (no libraries or such). ID: 64510 · Reply Quote

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64518 - Posted: 28 Sep 2021, 22:03:56 UTC - in response to Message 64510. The big difference for me is, I have never seen the server running out of Linux Tasks. Give it a day or two at this rate? It's running out in a hurry. My machine has only been up for a few days, but it does chew up PIDs at a great rate. But once a process has completed and been waited for, its entry can be deleted from the process table and its PID can be re-used. You really need a PID only for each process in the process table. In the old days, UNIX and Linux could execute a small number (a few hundred) processes at a time, and the process table was searched sequentially. Later this was increased to something like 32000 PIDs, and IIRC, they were hashed instead of sequentially searched. So now, with 64-bit machines, perhaps they hash them all the time.PID1 is the equivalent of the old init process that starts off all the others. Its name is systemd PID 2043 is started by systemd and is the BOINC client which starts all the other Boinc tasks. So, as you can see in the below list, most of the BOINC tasks were started by systemd PID 2043. The only difference is the CPDN tasks. Systemd starts the processes at the bottom of this list and then these processes start the actual worker processes, the ones at the top of the list. But my system is running normally and not doing much else other than running Boinic most of the time. top - 17:44:52 up 4 days, 10:20, 1 user, load average: 8.20, 8.39, 8.55 Tasks: 452 total, 9 running, 440 sleeping, 3 stopped, 0 zombie %Cpu(s): 0.2 us, 2.9 sy, 47.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st MiB Mem : 63902.3 total, 1470.5 free, 12065.6 used, 50366.2 buff/cache MiB Swap: 15992.0 total, 15977.2 free, 14.8 used. 50988.2 avail Mem PID PPID USER PR NI S RES SHR %MEM %CPU P TIME+ COMMAND 308902 308888 boinc 39 19 R 1.3g 19904 2.1 99.6 2 1443:11 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 331381 331356 boinc 39 19 T 1.3g 19748 2.1 0.0 7 975:29.32 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 312209 312164 boinc 39 19 T 1.3g 19940 2.1 0.0 7 944:49.84 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 371391 371381 boinc 39 19 T 1.3g 19940 2.1 0.0 5 319:35.24 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 417326 2043 boinc 39 19 R 866900 88936 1.3 99.3 7 159:57.54 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 412389 2043 boinc 39 19 R 785968 76036 1.2 99.6 5 221:45.13 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 411721 2043 boinc 39 19 R 781688 76036 1.2 99.5 6 231:42.70 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 416241 2043 boinc 39 19 R 759928 28688 1.2 99.5 3 175:16.58 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 391074 2043 boinc 39 19 R 715908 25232 1.1 99.8 8 513:27.05 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 425305 2043 boinc 39 19 R 107748 2676 0.2 99.5 1 41:11.11 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_a+ 427839 2043 boinc 39 19 R 68004 2456 0.1 99.6 0 7:01.12 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_m+ 2043 1 boinc 30 10 S 37208 17740 0.1 0.1 5 45382:39 /usr/bin/boinc 371381 2043 boinc 39 19 S 17844 17176 0.0 0.0 14 0:29.50 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 312164 2043 boinc 39 19 S 17764 17096 0.0 0.0 10 0:47.00 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 308888 2043 boinc 39 19 S 17668 17004 0.0 0.1 10 0:45.53 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 331356 2043 boinc 39 19 S 17668 17004 0.0 0.0 14 0:46.20 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ ID: 64518 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 64519 - Posted: 28 Sep 2021, 23:11:27 UTC I was talking about the available tasks for download here... https://www.cpdn.org/server_status.php Nothing to do with running out of PIDs. ID: 64519 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64524 - Posted: 29 Sep 2021, 4:08:50 UTC And they're all gone. There is one showing, but according to sources the mods have, it seems to be a ghost. I'll let the project know, and maybe the list can be "zeroed". ID: 64524 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64525 - Posted: 29 Sep 2021, 7:23:48 UTC And now there's 8 waiting, so it looks like they're failures. That makes it harder to see which ones they are. ID: 64525 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054	Message 64528 - Posted: 30 Sep 2021, 8:39:28 UTC - in response to Message 64474. so are there that many volunteers that never ever check their results? Unfortunately, yes. Looking at the 'Top participants' statistics page, user number #19 is Science United. Science United is an anonymising account manager, which appears to have been deliberately designed by David Anderson to disconnect volunteers from the science and technicalities of the projects they run. Science United users can read our forums here (as any member of the public can), but can't ask questions or seek help. They aren't given a password to log into the account. Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment. ID: 64528 · Reply Quote

wateroakley Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,374,828 RAC: 10,749	Message 64529 - Posted: 30 Sep 2021, 10:45:52 UTC - in response to Message 64528. Last modified: 30 Sep 2021, 10:53:23 UTC Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment. With a few minutes of spare time ... rough orders of magnitude: For the last 30 days ... 847 devices have made contact. The first 100 computers with a recent credit number, about 75% are M$ and 25% are Linux. Scanning down the list for credits, a few WAH returns from M$ devices and six Linux devices with more than 1,000 recent credit. These six Linux devices are trickling 13 tasks to credit with 11 other tasks in the wings. More than 90% of the 847 do not appear to be doing anything useful, unsurprisingly for the 75% M$ cohort. Since 2019 ... the user appears to have 4470 devices attached. Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions. ID: 64529 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64539 - Posted: 30 Sep 2021, 15:22:33 UTC - in response to Message 64529. Since 2019 ... the user appears to have 4470 devices attached. Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions. I think there is a case for deleting the SU account. ID: 64539 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054	Message 64542 - Posted: 30 Sep 2021, 16:54:30 UTC - in response to Message 64539. Since 2019 ... the user appears to have 4470 devices attached. Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions. I think there is a case for deleting the SU account. Unfortunately, it'll keep getting re-created like a bad penny. I think the only guaranteed solution is for BOINC project administrators - collectively - to speak truth to power. They'll have to decide whether the 'power' in question is David Anderson in person, the Regents of the University of California corporately, or both. ID: 64542 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022	Message 64558 - Posted: 1 Oct 2021, 13:44:13 UTC - in response to Message 64542. Since 2019 ... the user appears to have 4470 devices attached. Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions. I think there is a case for deleting the SU account. Unfortunately, it'll keep getting re-created like a bad penny. I think the only guaranteed solution is for BOINC project administrators - collectively - to speak truth to power. They'll have to decide whether the 'power' in question is David Anderson in person, the Regents of the University of California corporately, or both. Power often just ignores truth and goes on doing as it pleases. ID: 64558 · Reply Quote

KAMasud Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0	Message 64559 - Posted: 1 Oct 2021, 18:50:54 UTC Well, there is nothing left to hoard. I thought I would never see the day when Linux tasks would be zero. :) Congratulations. ID: 64559 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 64560 - Posted: 2 Oct 2021, 1:33:01 UTC Me either... Just processing through the last of the batches, I suppose. I've kind of built out some of my solar powered BOINC boxes for CPDN specifically - the eDRAM builds (huge L4 cache) are for optimizing the compute of these memory hungry beasties. ID: 64560 · Reply Quote

[SG]Felix Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374	Message 64563 - Posted: 2 Oct 2021, 7:24:31 UTC Now there are no new work units since a few days, but I still get resends to kepp my VMs running. Lets see, when new work will become available ID: 64563 · Reply Quote

KAMasud Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0	Message 64578 - Posted: 3 Oct 2021, 12:12:37 UTC - in response to Message 64577. KAMasud - As pointed out elsewhere, on WCC the Africa Rainfall Project does NOT have GPU tasks. Only the Open Pandemics Project does. As far as your "ghost" tasks, I am not sure what you mean by your "accounts page". According to the CPDN server your Computer #1 (the one with the GPU) has 1 task in progress, Computer #2 (I710750H) has 7 in progress, and Computer #3 (I7-8750H) has 6 in progress. All of these tasks were downloaded in the last month or two. If the Tasks tab in the BOINC Manager on your computer shows more that these, you have "lost" or "ghost" tasks. To clear this up, when you have NO tasks on your computer according to the CPDN Server, go to the Projects tab and Remove the project. Wait 10 minutes. Add the Project back. This has always worked for me. _____________ The problem is, I have according to the site data thirteen WU's in progress. Whereas I have five in progress. Eight WU's are missing(Internet black hole grabbed them). I will remove my computers from the project then re-add them, will it solve the problem? i7-8750H has three WU's while i710750 has two WU's. Linux in VB. Then I have three being shown as validation pending? Now Africa Rainfall Project, why is my event log saying there are GPU WU's available but they need my Intel GPU and not my regular Nvidia? Never mind, not a life and death situation. ID: 64578 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64579 - Posted: 3 Oct 2021, 12:44:17 UTC - in response to Message 64578. Then I have three being shown as validation pending? There are a few BOINC options not used by this project; validation pending is one of them. So what's recorded against this option for each task is whatever was there when the task finished. So ignore it. ID: 64579 · Reply Quote