climateprediction.net (CPDN) home page
Thread 'What does "Didn't need" mean on work-unit status webpage?'

Thread 'What does "Didn't need" mean on work-unit status webpage?'

Message boards : Number crunching : What does "Didn't need" mean on work-unit status webpage?
Message board moderation

To post messages, you must log in.

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 68196 - Posted: 4 Feb 2023, 14:07:15 UTC

For some recent OpenIFS batch resends I've seen a few workunits that have "Didn't need" in the status column of the workunit webpage. For example:
https://www.cpdn.org/workunit.php?wuid=12172588

I've never seen that before. I'm used to seeing 'don't need any tasks' in the messages on the client side. But this is on the server side, so I'm wondering how that came about?

It's strange because it takes up one of the 3 possible runs without the task, seemingly, having been downloaded to a host.

Anyone know what's causes this?
ID: 68196 · Report as offensive     Reply Quote
ProfilePDW

Send message
Joined: 29 Nov 17
Posts: 82
Credit: 14,454,482
RAC: 90,922
Message 68197 - Posted: 4 Feb 2023, 14:35:18 UTC - in response to Message 68196.  

Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need".

It was created 21 Dec 2022, 9:57:22 UTC so I'm guessing it never made it out to a host for processing given the problems around that time.
Why it didn't escape when networking problems got resolved is a mystery but since it went past its 1 month deadline it has a new chance to create life.
ID: 68197 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68200 - Posted: 4 Feb 2023, 16:01:25 UTC - in response to Message 68197.  

Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need".


I do not understand how that applies on the current task my machine is working on.
If the one marked "Didn't need" ran before mine, why send it to me at all? And if my machine was the first one to run, I have not anywhere near finished this task, so how can they decide mine will complete successfully?
it is true that mine do complete successfully.

Workunit 12173514
name 	oifs_43r3_ps_0870_1987050100_123_956_12173514
application 	OpenIFS 43r3 Perturbed Surface
created 	21 Dec 2022, 12:24:43 UTC
minimum quorum 	1
initial replication 	1
max # of error/total/success tasks 	3, 3, 1
Task
click for details	Computer	Sent	Time reported
or deadline
explain	Status	Run time
(sec)	CPU time
(sec)	Credit	Application
22258087 	--- 	--- 	--- 	Didn't need 	0.00 	0.00 	--- 	---
22305507 	1511241 	4 Feb 2023, 12:25:23 UTC 	5 Apr 2023, 12:25:23 UTC 	In progress 	--- 	--- 	--- 	OpenIFS 43r3 Perturbed Surface v1.05 x86_64-pc-linux-gnu

OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu
Number of tasks completed 	214
Max tasks per day 	218
Number of tasks today 	1
Consecutive valid tasks 	214
Average processing rate 	28.28 GFLOPS
Average turnaround time 	3.57 days

ID: 68200 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 68201 - Posted: 4 Feb 2023, 16:20:05 UTC - in response to Message 68197.  

PDW wrote:
Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need".

It was created 21 Dec 2022, 9:57:22 UTC so I'm guessing it never made it out to a host for processing given the problems around that time.
Why it didn't escape when networking problems got resolved is a mystery but since it went past its 1 month deadline it has a new chance to create life.
Ah yes, I forgot to look at the creation date for the task. Thanks, that makes sense. I guess it was in the queue to send but with 1000s ahead of it the resend time came around before it went out.

Jean-David wrote:
I do not understand how that applies on the current task my machine is working on.
If the one marked "Didn't need" ran before mine, why send it to me at all?
This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran.
ID: 68201 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,819,420
RAC: 19,777
Message 68206 - Posted: 5 Feb 2023, 2:22:11 UTC - in response to Message 68201.  

Jean-David wrote:
I do not understand how that applies on the current task my machine is working on.
If the one marked "Didn't need" ran before mine, why send it to me at all?
This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran.

I think it does, it seems to be the exact situation as yours. These tasks expired while still in queue waiting to be sent out. I'm trying to remember if I've seen a similar thing at MilkyWay last year when they had validation issues and very excessive queue of tasks was getting generated.
ID: 68206 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68207 - Posted: 5 Feb 2023, 2:41:00 UTC - in response to Message 68206.  
Last modified: 5 Feb 2023, 2:47:14 UTC

I think it does, it seems to be the exact situation as yours. These tasks expired while still in queue waiting to be sent out.


I wish I knew which one was sent out first..

One was marked "didn't need" or something like that.
Mine is still in progress, 92% complete after about 14 hours and about an hour ago.

So if the other one started first, on what basis was it marked didn't need? Surely not because my task reported successful completion (that I predict it will, but they have no basis to know my prediction is correct). And if mine started first, they could send it to the other machine also, but why? I started execution as soon as I received it or as soon after that as I got a free core to execute it that would have been less than an hour after receiving it.

Here is the other one.
Task 22258087
Name 	oifs_43r3_ps_0870_1987050100_123_956_12173514_0
Workunit 	12173514
Created 	21 Dec 2022, 12:24:44 UTC


And here is mine.
Task 22305507
Name 	oifs_43r3_ps_0870_1987050100_123_956_12173514_1
Workunit 	12173514
Created 	4 Feb 2023, 12:24:48 UTC
Sent 	4 Feb 2023, 12:25:23 UTC
Report deadline 	5 Apr 2023, 12:25:23 UTC
Received 	---
Server state 	In progress
Outcome 	---
Client state 	New
Exit status 	0 (0x00000000)
Computer ID 	1511241


Suddenly it all becomes obscure.
ID: 68207 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,819,420
RAC: 19,777
Message 68208 - Posted: 5 Feb 2023, 3:55:21 UTC - in response to Message 68207.  

Suddenly it all becomes obscure.

It has nothing to do with you. It's just the logic of the system. A task can expire in queue eventually, if it doesn't get sent out in time, and consequently gets mark as "Didn't need", presumably that being the reason it was never sent out. But because there's still no valid result and 3 attempts haven't been made or 3 errors recorded, a new task gets generated and placed in the queue. If this happens 3 times the workunit will be marked as possibly having bugs or something else. At least that's how I think it works.

So the task you got (and Glenn) is the 2nd "attempt" at that workunit but you're the first person to try to process a task from that workunit. The 1st task just expired while waiting in line to be sent out, it was never sent out and thus attempted. If you process your task successfully, the workunit will get a valid result and that'll be the end of it.
ID: 68208 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68209 - Posted: 5 Feb 2023, 4:27:10 UTC - in response to Message 68208.  

OK: that is the end of it.

Task 22305507
Name 	oifs_43r3_ps_0870_1987050100_123_956_12173514_1
Workunit 	12173514
Created 	4 Feb 2023, 12:24:48 UTC
Sent 	4 Feb 2023, 12:25:23 UTC
Report deadline 	5 Apr 2023, 12:25:23 UTC
Received 	5 Feb 2023, 3:55:04 UTC
Server state 	Over
Outcome 	Success <---<<<
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	1511241
Run time 	15 hours 19 min 36 sec
CPU time 	15 hours 1 min 29 sec
Validate state 	Valid <---<<<
Credit 	0.00
Device peak FLOPS 	6.06 GFLOPS
Application version 	OpenIFS 43r3 Perturbed Surface v1.05
x86_64-pc-linux-gnu
Peak working set size 	4,729.18 MB
Peak swap size 	4,974.32 MB
Peak disk usage 	1,224.17 MB

ID: 68209 · Report as offensive     Reply Quote
alanb1951

Send message
Joined: 31 Aug 04
Posts: 37
Credit: 9,581,380
RAC: 3,853
Message 68210 - Posted: 5 Feb 2023, 4:37:59 UTC - in response to Message 68206.  
Last modified: 5 Feb 2023, 4:46:40 UTC

AndreyOR wrote:
Jean-David wrote:
I do not understand how that applies on the current task my machine is working on.
If the one marked "Didn't need" ran before mine, why send it to me at all?
This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran.

I think it does, it seems to be the exact situation as yours. These tasks expired while still in queue waiting to be sent out. I'm trying to remember if I've seen a similar thing at MilkyWay last year when they had validation issues and very excessive queue of tasks was getting generated.

A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly1.

So if a work unit is to be withdrawn and resubmitted with changed parameters (or with a revised application, or...) there may be tasks marked "Didn't need" as a result of that...

Cheers - Al.

1 I did a code dive at the time, then had an exchange of information with their Admin about the reason their generator had run away - I think they have fixed it now :-)
ID: 68210 · Report as offensive     Reply Quote
Yeti

Send message
Joined: 5 Aug 04
Posts: 178
Credit: 18,743,701
RAC: 48,070
Message 68211 - Posted: 5 Feb 2023, 14:58:46 UTC

Meanwhile I have got lots of "Didn't need" WUs, they do fine
Supporting BOINC, a great concept !
ID: 68211 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,819,420
RAC: 19,777
Message 68217 - Posted: 7 Feb 2023, 7:18:38 UTC - in response to Message 68210.  
Last modified: 7 Feb 2023, 7:19:45 UTC

A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly1.

So if a work unit is to be withdrawn and resubmitted with changed parameters (or with a revised application, or...) there may be tasks marked "Didn't need" as a result of that...

This would seem to mean that changes can be made mid-batch. So if Glenn makes code changes to deal with bugs or other improvements, CPDN can cancel unsent tasks, release new app version and turn things back on again. It made me think of this as Glenn has mentioned that he's improved the app but we have to wait until the next batch is released. It seems like that doesn't have to be the case. There's a downside that doing it this way would be using up one of the 3 attempts but perhaps that can be changed too to compensate?

1 I did a code dive at the time, then had an exchange of information with their Admin about the reason their generator had run away - I think they have fixed it now :-)

That's good that part is fixed. Hopefully they'll upgrade their servers sometime soon, hardware and software. They already got the equipment months ago as well as input on what changes to make before recompiling the server software.
ID: 68217 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 68218 - Posted: 7 Feb 2023, 11:11:02 UTC - in response to Message 68217.  

A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly.
So if a work unit is to be withdrawn and resubmitted with changed parameters (or with a revised application, or...) there may be tasks marked "Didn't need" as a result of that...
This would seem to mean that changes can be made mid-batch. So if Glenn makes code changes to deal with bugs or other improvements, CPDN can cancel unsent tasks, release new app version and turn things back on again. It made me think of this as Glenn has mentioned that he's improved the app but we have to wait until the next batch is released. It seems like that doesn't have to be the case. There's a downside that doing it this way would be using up one of the 3 attempts but perhaps that can be changed too to compensate?
I think if a batch is cancelled, the tasks should up as 'Cancelled' rather than Don't need? I think I've seen that on some of the other projects I run.

We don't change the software on active batches. It messes up the task failure analyses for one thing and might risk altering results for another. Any new software has to go through a series of tests on the dev(elopment) site first, then it gets digitally signed before going out to the main production site. It's a fairly lengthy process to move code from development to production. I've not been with the project long enough to know if they ever cancel batches in mid-stream. None to my knowledge. I've seen other projects do it but they tend to run much bigger batches.
ID: 68218 · Report as offensive     Reply Quote

Message boards : Number crunching : What does "Didn't need" mean on work-unit status webpage?

©2024 cpdn.org