climateprediction.net (CPDN) home page
Thread 'Tasks by application = hoarding'

Thread 'Tasks by application = hoarding'

Message boards : Number crunching : Tasks by application = hoarding
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64471 - Posted: 16 Sep 2021, 13:49:15 UTC

I am always amazed to see four failures, and yet my machine has no trouble processing the work unit.

Not so surprising when most of the failures are due to missing libraries.
ID: 64471 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64472 - Posted: 16 Sep 2021, 20:36:30 UTC - in response to Message 64471.  

I am always amazed to see four failures, and yet my machine has no trouble processing the work unit.


Not so surprising when most of the failures are due to missing libraries.


I could make a statistically sloppy assumption and propose that 4/5 of the machines lack proper libraries. But it is so frequent.
But CPDN has always used 32-bit libraries, so are there that many volunteers that never ever check their results?
ID: 64472 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,033,903
RAC: 14,766
Message 64473 - Posted: 16 Sep 2021, 22:59:29 UTC - in response to Message 64471.  

Missing libraries and/or wierd file systems that the BOINC client can't decifer.
ID: 64473 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64474 - Posted: 17 Sep 2021, 2:06:16 UTC - in response to Message 64472.  

so are there that many volunteers that never ever check their results?

Unfortunately, yes.
ID: 64474 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,033,903
RAC: 14,766
Message 64476 - Posted: 17 Sep 2021, 23:16:07 UTC - in response to Message 64474.  

so are there that many volunteers that never ever check their results?

Unfortunately, yes.


And by inference their tasks.
ID: 64476 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 64510 - Posted: 28 Sep 2021, 16:36:08 UTC - in response to Message 64448.  

The big difference for me is, I have never seen the server running out of Linux Tasks.


Give it a day or two at this rate? It's running out in a hurry.

Clearly an awful lot of the Linux tasks are being chewed through, though I can't see how many of the WUs are actually completed successfully vs hitting all the failures from libraries or such (I can see that a lot of the stuff I'm working on has plenty of other failures, though they're not all library failure - some download errors, some weird filesystem errors, etc).

I'm draining out some of my compute boxes as we head into winter (letting them finish CPDN tasks and not pull new ones), since I won't have the solar surplus to run the long tasks nearly as well, and I'll shift to WCG or such for heating (shorter tasks, and they handle machine shutdowns properly - suspend is fine in the summer, but draws enough more watts than shutdown in the winter to be annoying). I've got a perfectly good resistive heater, but it feels wrong to do that when I can be doing something useful with the power in the process anyway.

But, in terms of credit statistics (which seems a better way to analyze work done): https://www.cpdn.org/top_users.php

I turn in around 30k "recent average credit," and I'm returning about half a WU per day, ish. Depending on the day and sun and such, but in the range of 0.5 to 1 WU/day.

On the leaders page, there are about 900k "recent average credit" points showing for the first 20 users, plus plenty of other users.

So the front page alone, that's around 30 N216/day of compute, plus all the other users.

That seems a bit low for how rapidly WUs are being chewed through (seems to be around 100/day?), but I don't know if "hoarding" is to blame so much as machines rapidly failing a lot of WUs. I just don't think I've got the ability to see how many WUs fail for "science" reasons (negative theta or pressure, etc), vs "compute" reasons (no libraries or such).
ID: 64510 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64518 - Posted: 28 Sep 2021, 22:03:56 UTC - in response to Message 64510.  

The big difference for me is, I have never seen the server running out of Linux Tasks.

Give it a day or two at this rate? It's running out in a hurry.


My machine has only been up for a few days, but it does chew up PIDs at a great rate. But once a process has completed and been waited for, its entry can be deleted from the process table and its PID can be re-used. You really need a PID only for each process in the process table.

In the old days, UNIX and Linux could execute a small number (a few hundred) processes at a time, and the process table was searched sequentially. Later this was increased to something like 32000 PIDs, and IIRC, they were hashed instead of sequentially searched. So now, with 64-bit machines, perhaps they hash them all the time.PID1 is the equivalent of the old init process that starts off all the others. Its name is systemd
PID 2043 is started by systemd and is the BOINC client which starts all the other Boinc tasks. So, as you can see in the below list, most of the BOINC tasks were started by systemd PID 2043. The only difference is the CPDN tasks. Systemd starts the processes at the bottom of this list and then these processes start the actual worker processes, the ones at the top of the list. But my system is running normally and not doing much else other than running Boinic most of the time.

top - 17:44:52 up 4 days, 10:20,  1 user,  load average: 8.20, 8.39, 8.55
Tasks: 452 total,   9 running, 440 sleeping,   3 stopped,   0 zombie
%Cpu(s):  0.2 us,  2.9 sy, 47.0 ni, 49.8 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem :  63902.3 total,   1470.5 free,  12065.6 used,  50366.2 buff/cache
MiB Swap:  15992.0 total,  15977.2 free,     14.8 used.  50988.2 avail Mem 

    PID    PPID USER      PR  NI S    RES    SHR  %MEM  %CPU  P     TIME+ COMMAND                                                  
 308902  308888 boinc     39  19 R   1.3g  19904   2.1  99.6  2   1443:11 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 
 331381  331356 boinc     39  19 T   1.3g  19748   2.1   0.0  7 975:29.32 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 
 312209  312164 boinc     39  19 T   1.3g  19940   2.1   0.0  7 944:49.84 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 
 371391  371381 boinc     39  19 T   1.3g  19940   2.1   0.0  5 319:35.24 /var/lib/boinc/projects/climateprediction.net/hadam4_um+ 
 417326    2043 boinc     39  19 R 866900  88936   1.3  99.3  7 159:57.54 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 
 412389    2043 boinc     39  19 R 785968  76036   1.2  99.6  5 221:45.13 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 
 411721    2043 boinc     39  19 R 781688  76036   1.2  99.5  6 231:42.70 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_+ 
 416241    2043 boinc     39  19 R 759928  28688   1.2  99.5  3 175:16.58 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 
 391074    2043 boinc     39  19 R 715908  25232   1.1  99.8  8 513:27.05 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_w+ 
 425305    2043 boinc     39  19 R 107748   2676   0.2  99.5  1  41:11.11 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_a+ 
 427839    2043 boinc     39  19 R  68004   2456   0.1  99.6  0   7:01.12 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_m+ 
   2043       1 boinc     30  10 S  37208  17740   0.1   0.1  5  45382:39 /usr/bin/boinc                                           
 371381    2043 boinc     39  19 S  17844  17176   0.0   0.0 14   0:29.50 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 
 312164    2043 boinc     39  19 S  17764  17096   0.0   0.0 10   0:47.00 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 
 308888    2043 boinc     39  19 S  17668  17004   0.0   0.1 10   0:45.53 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 
 331356    2043 boinc     39  19 S  17668  17004   0.0   0.0 14   0:46.20 ../../projects/climateprediction.net/hadam4_8.52_i686-p+ 

ID: 64518 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 64519 - Posted: 28 Sep 2021, 23:11:27 UTC

I was talking about the available tasks for download here... https://www.cpdn.org/server_status.php

Nothing to do with running out of PIDs.
ID: 64519 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64524 - Posted: 29 Sep 2021, 4:08:50 UTC

And they're all gone.

There is one showing, but according to sources the mods have, it seems to be a ghost.
I'll let the project know, and maybe the list can be "zeroed".
ID: 64524 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64525 - Posted: 29 Sep 2021, 7:23:48 UTC

And now there's 8 waiting, so it looks like they're failures.
That makes it harder to see which ones they are.
ID: 64525 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 64528 - Posted: 30 Sep 2021, 8:39:28 UTC - in response to Message 64474.  

so are there that many volunteers that never ever check their results?

Unfortunately, yes.

Looking at the 'Top participants' statistics page, user number #19 is Science United.

Science United is an anonymising account manager, which appears to have been deliberately designed by David Anderson to disconnect volunteers from the science and technicalities of the projects they run. Science United users can read our forums here (as any member of the public can), but can't ask questions or seek help. They aren't given a password to log into the account.

Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment.
ID: 64528 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,374,828
RAC: 10,749
Message 64529 - Posted: 30 Sep 2021, 10:45:52 UTC - in response to Message 64528.  
Last modified: 30 Sep 2021, 10:53:23 UTC

Science United has attached 846 computers to this project since 16 Feb 2019. Somebody else can check how many tasks they have in progress at the moment.
With a few minutes of spare time ... rough orders of magnitude:

For the last 30 days ... 847 devices have made contact.
The first 100 computers with a recent credit number, about 75% are M$ and 25% are Linux.
Scanning down the list for credits, a few WAH returns from M$ devices and six Linux devices with more than 1,000 recent credit.
These six Linux devices are trickling 13 tasks to credit with 11 other tasks in the wings.
More than 90% of the 847 do not appear to be doing anything useful, unsurprisingly for the 75% M$ cohort.

Since 2019 ... the user appears to have 4470 devices attached.
Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions.
ID: 64529 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64539 - Posted: 30 Sep 2021, 15:22:33 UTC - in response to Message 64529.  

Since 2019 ... the user appears to have 4470 devices attached.
Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions.


I think there is a case for deleting the SU account.
ID: 64539 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 64542 - Posted: 30 Sep 2021, 16:54:30 UTC - in response to Message 64539.  

Since 2019 ... the user appears to have 4470 devices attached.
Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions.
I think there is a case for deleting the SU account.
Unfortunately, it'll keep getting re-created like a bad penny.

I think the only guaranteed solution is for BOINC project administrators - collectively - to speak truth to power. They'll have to decide whether the 'power' in question is David Anderson in person, the Regents of the University of California corporately, or both.
ID: 64542 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 64558 - Posted: 1 Oct 2021, 13:44:13 UTC - in response to Message 64542.  

Since 2019 ... the user appears to have 4470 devices attached.
Scanning down the sorted list for credits, somewhere around 90% have not successfully returned task completions.
I think there is a case for deleting the SU account.
Unfortunately, it'll keep getting re-created like a bad penny.

I think the only guaranteed solution is for BOINC project administrators - collectively - to speak truth to power. They'll have to decide whether the 'power' in question is David Anderson in person, the Regents of the University of California corporately, or both.


Power often just ignores truth and goes on doing as it pleases.
ID: 64558 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64559 - Posted: 1 Oct 2021, 18:50:54 UTC

Well, there is nothing left to hoard. I thought I would never see the day when Linux tasks would be zero. :) Congratulations.
ID: 64559 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 64560 - Posted: 2 Oct 2021, 1:33:01 UTC

Me either... Just processing through the last of the batches, I suppose. I've kind of built out some of my solar powered BOINC boxes for CPDN specifically - the eDRAM builds (huge L4 cache) are for optimizing the compute of these memory hungry beasties.
ID: 64560 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,075,151
RAC: 374
Message 64563 - Posted: 2 Oct 2021, 7:24:31 UTC

Now there are no new work units since a few days, but I still get resends to kepp my VMs running. Lets see, when new work will become available
ID: 64563 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64578 - Posted: 3 Oct 2021, 12:12:37 UTC - in response to Message 64577.  

KAMasud -

As pointed out elsewhere, on WCC the Africa Rainfall Project does NOT have GPU tasks. Only the Open Pandemics Project does.

As far as your "ghost" tasks, I am not sure what you mean by your "accounts page".

According to the CPDN server your Computer #1 (the one with the GPU) has 1 task in progress, Computer #2 (I710750H) has 7 in progress, and Computer #3 (I7-8750H) has 6 in progress. All of these tasks were downloaded in the last month or two.

If the Tasks tab in the BOINC Manager on your computer shows more that these, you have "lost" or "ghost" tasks.

To clear this up, when you have NO tasks on your computer according to the CPDN Server, go to the Projects tab and Remove the project. Wait 10 minutes. Add the Project back. This has always worked for me.

_____________

The problem is, I have according to the site data thirteen WU's in progress. Whereas I have five in progress. Eight WU's are missing(Internet black hole grabbed them). I will remove my computers from the project then re-add them, will it solve the problem? i7-8750H has three WU's while i710750 has two WU's. Linux in VB.
Then I have three being shown as validation pending?
Now Africa Rainfall Project, why is my event log saying there are GPU WU's available but they need my Intel GPU and not my regular Nvidia?
Never mind, not a life and death situation.
ID: 64578 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64579 - Posted: 3 Oct 2021, 12:44:17 UTC - in response to Message 64578.  

Then I have three being shown as validation pending?

There are a few BOINC options not used by this project; validation pending is one of them.
So what's recorded against this option for each task is whatever was there when the task finished.

So ignore it.
ID: 64579 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Tasks by application = hoarding

©2024 cpdn.org