Message boards : Number crunching : What does "Didn't need" mean on work-unit status webpage?
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
For some recent OpenIFS batch resends I've seen a few workunits that have "Didn't need" in the status column of the workunit webpage. For example: https://www.cpdn.org/workunit.php?wuid=12172588 I've never seen that before. I'm used to seeing 'don't need any tasks' in the messages on the client side. But this is on the server side, so I'm wondering how that came about? It's strange because it takes up one of the 3 possible runs without the task, seemingly, having been downloaded to a host. Anyone know what's causes this? |
Send message Joined: 29 Nov 17 Posts: 82 Credit: 14,451,997 RAC: 90,818 |
Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need". It was created 21 Dec 2022, 9:57:22 UTC so I'm guessing it never made it out to a host for processing given the problems around that time. Why it didn't escape when networking problems got resolved is a mystery but since it went past its 1 month deadline it has a new chance to create life. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need". I do not understand how that applies on the current task my machine is working on. If the one marked "Didn't need" ran before mine, why send it to me at all? And if my machine was the first one to run, I have not anywhere near finished this task, so how can they decide mine will complete successfully? it is true that mine do complete successfully. Workunit 12173514 name oifs_43r3_ps_0870_1987050100_123_956_12173514 application OpenIFS 43r3 Perturbed Surface created 21 Dec 2022, 12:24:43 UTC minimum quorum 1 initial replication 1 max # of error/total/success tasks 3, 3, 1 Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 22258087 --- --- --- Didn't need 0.00 0.00 --- --- 22305507 1511241 4 Feb 2023, 12:25:23 UTC 5 Apr 2023, 12:25:23 UTC In progress --- --- --- OpenIFS 43r3 Perturbed Surface v1.05 x86_64-pc-linux-gnu OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu Number of tasks completed 214 Max tasks per day 218 Number of tasks today 1 Consecutive valid tasks 214 Average processing rate 28.28 GFLOPS Average turnaround time 3.57 days |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
PDW wrote: Usually "Didn't need" is because a valid result(s) has been received back by the server and it marks unstarted and therefore unwanted tasks as "Didn't need".Ah yes, I forgot to look at the creation date for the task. Thanks, that makes sense. I guess it was in the queue to send but with 1000s ahead of it the resend time came around before it went out. Jean-David wrote: I do not understand how that applies on the current task my machine is working on.This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,818,592 RAC: 19,777 |
Jean-David wrote:I do not understand how that applies on the current task my machine is working on.This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran. I think it does, it seems to be the exact situation as yours. These tasks expired while still in queue waiting to be sent out. I'm trying to remember if I've seen a similar thing at MilkyWay last year when they had validation issues and very excessive queue of tasks was getting generated. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I think it does, it seems to be the exact situation as yours. These tasks expired while still in queue waiting to be sent out. I wish I knew which one was sent out first.. One was marked "didn't need" or something like that. Mine is still in progress, 92% complete after about 14 hours and about an hour ago. So if the other one started first, on what basis was it marked didn't need? Surely not because my task reported successful completion (that I predict it will, but they have no basis to know my prediction is correct). And if mine started first, they could send it to the other machine also, but why? I started execution as soon as I received it or as soon after that as I got a free core to execute it that would have been less than an hour after receiving it. Here is the other one. Task 22258087 Name oifs_43r3_ps_0870_1987050100_123_956_12173514_0 Workunit 12173514 Created 21 Dec 2022, 12:24:44 UTC And here is mine. Task 22305507 Name oifs_43r3_ps_0870_1987050100_123_956_12173514_1 Workunit 12173514 Created 4 Feb 2023, 12:24:48 UTC Sent 4 Feb 2023, 12:25:23 UTC Report deadline 5 Apr 2023, 12:25:23 UTC Received --- Server state In progress Outcome --- Client state New Exit status 0 (0x00000000) Computer ID 1511241 Suddenly it all becomes obscure. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,818,592 RAC: 19,777 |
Suddenly it all becomes obscure. It has nothing to do with you. It's just the logic of the system. A task can expire in queue eventually, if it doesn't get sent out in time, and consequently gets mark as "Didn't need", presumably that being the reason it was never sent out. But because there's still no valid result and 3 attempts haven't been made or 3 errors recorded, a new task gets generated and placed in the queue. If this happens 3 times the workunit will be marked as possibly having bugs or something else. At least that's how I think it works. So the task you got (and Glenn) is the 2nd "attempt" at that workunit but you're the first person to try to process a task from that workunit. The 1st task just expired while waiting in line to be sent out, it was never sent out and thus attempted. If you process your task successfully, the workunit will get a valid result and that'll be the end of it. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
OK: that is the end of it. Task 22305507 Name oifs_43r3_ps_0870_1987050100_123_956_12173514_1 Workunit 12173514 Created 4 Feb 2023, 12:24:48 UTC Sent 4 Feb 2023, 12:25:23 UTC Report deadline 5 Apr 2023, 12:25:23 UTC Received 5 Feb 2023, 3:55:04 UTC Server state Over Outcome Success <---<<< Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 15 hours 19 min 36 sec CPU time 15 hours 1 min 29 sec Validate state Valid <---<<< Credit 0.00 Device peak FLOPS 6.06 GFLOPS Application version OpenIFS 43r3 Perturbed Surface v1.05 x86_64-pc-linux-gnu Peak working set size 4,729.18 MB Peak swap size 4,974.32 MB Peak disk usage 1,224.17 MB |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
AndreyOR wrote: Jean-David wrote:I do not understand how that applies on the current task my machine is working on.This doesn't apply to your tasks. I'm referring to a task that was never sent to anyone. It never ran. A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly1. So if a work unit is to be withdrawn and resubmitted with changed parameters (or with a revised application, or...) there may be tasks marked "Didn't need" as a result of that... Cheers - Al. 1 I did a code dive at the time, then had an exchange of information with their Admin about the reason their generator had run away - I think they have fixed it now :-) |
Send message Joined: 5 Aug 04 Posts: 178 Credit: 18,743,701 RAC: 48,070 |
Meanwhile I have got lots of "Didn't need" WUs, they do fine Supporting BOINC, a great concept ! |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,818,592 RAC: 19,777 |
A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly1. This would seem to mean that changes can be made mid-batch. So if Glenn makes code changes to deal with bugs or other improvements, CPDN can cancel unsent tasks, release new app version and turn things back on again. It made me think of this as Glenn has mentioned that he's improved the app but we have to wait until the next batch is released. It seems like that doesn't have to be the case. There's a downside that doing it this way would be using up one of the 3 attempts but perhaps that can be changed too to compensate? 1 I did a code dive at the time, then had an exchange of information with their Admin about the reason their generator had run away - I think they have fixed it now :-) That's good that part is fixed. Hopefully they'll upgrade their servers sometime soon, hardware and software. They already got the equipment months ago as well as input on what changes to make before recompiling the server software. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I think if a batch is cancelled, the tasks should up as 'Cancelled' rather than Don't need? I think I've seen that on some of the other projects I run.A task also gets marked "Didn't need" if the BOINC admin decides to cancel [part of?] a work unit -- that was what happened in bulk at MilkyWay, so you remembered correctly.This would seem to mean that changes can be made mid-batch. So if Glenn makes code changes to deal with bugs or other improvements, CPDN can cancel unsent tasks, release new app version and turn things back on again. It made me think of this as Glenn has mentioned that he's improved the app but we have to wait until the next batch is released. It seems like that doesn't have to be the case. There's a downside that doing it this way would be using up one of the 3 attempts but perhaps that can be changed too to compensate? We don't change the software on active batches. It messes up the task failure analyses for one thing and might risk altering results for another. Any new software has to go through a series of tests on the dev(elopment) site first, then it gets digitally signed before going out to the main production site. It's a fairly lengthy process to move code from development to production. I've not been with the project long enough to know if they ever cancel batches in mid-stream. None to my knowledge. I've seen other projects do it but they tend to run much bigger batches. |
©2024 cpdn.org