climateprediction.net home page
w/u completed but still showing as "Server state In progress"

w/u completed but still showing as "Server state In progress"

Message boards : Number crunching : w/u completed but still showing as "Server state In progress"
Message board moderation

To post messages, you must log in.

AuthorMessage
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 63089 - Posted: 3 Dec 2020, 20:14:31 UTC

So I sat and watched a w/u complete...... sad I know. It uploaded the 5th trickle ok. Then came to the end of processing and produced a small file to upload, which uploaded ok. The w/u status changed Ready to report ?. Which it did.

All very satisfying after 20 days. We dont all have super computers.....

But a day later the w/u is still showing as "In progress" on the "tasks for computer" page.

https://www.cpdn.org/result.php?resultid=21957285

The computer it came from only has 1 cpdn task left to do before its harddisk is upgraded.
Has the server just not caught up yet or is it another fiendish way of losing a w/u at 100%
Ta
Nairb
ID: 63089 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 63092 - Posted: 4 Dec 2020, 0:16:01 UTC - in response to Message 63089.  

I've had several of these across a few PCs. None lately though. I'm not sure whether there is some congestion/contention on the server when the reports happen, or what. Maybe 2% of my completed models have done this. I don't think there is anything you can do about it. I've had it happen a few times when a task reported around the time that the credit script is running on the server. Yours may have transmitted in that window (which is weekly, late Wed/early Thu UK time).

That task will time out in a year and may be reissued then, which is probably ridiculous. The only way for it to be reissued from that work unit before that is if you detach and then reattach the computer to cpdn. Of course you wouldn't want any cpdn tasks on that PC at the time you detached/reattached. It's up to you if you want to do that. No big deal if you don't. The completed (but not acknowledged as completed by the server) task will be labelled as abandoned on your list of tasks after the detach/reattach and a new task from that work unit would be ready to go out to another computer. Like I said, it's completely up to you if you want to go through that process.

Sorry about that...it's a bummer when it happens.
ID: 63092 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 63108 - Posted: 9 Dec 2020, 21:40:21 UTC

Thanks for the answer..... So it seems there is another way to lose a w/u at 100%. Without any warning a w/u can be wasted on the last "ready to report" communication. And all is lost................... along with any remaining sense of satisfaction which is all there really is from participating in these projects.
I will let the remaining 5 run to completion. Who knows - a couple of w/u might actually end up being useful.
ID: 63108 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63110 - Posted: 10 Dec 2020, 4:42:44 UTC - in response to Message 63108.  

The last zip upload should have gone to the server OK, so all is not lost.
It's just that the completion will not be registered on your tasks page, so you have to remember that mentally.
ID: 63110 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 63118 - Posted: 17 Dec 2020, 2:21:36 UTC - in response to Message 63110.  

The last zip upload should have gone to the server OK, so all is not lost.
It's just that the completion will not be registered on your tasks page, so you have to remember that mentally.


Ok, so the w/u should be valid for the scientists but just shows "in progress" on the tasks page. Which is a better outcome.

It must be possible to run a script to update the tasks page. But if the task has finished ok I guess there is little incentive to do this.
ID: 63118 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 149
Credit: 12,830,559
RAC: 228
Message 63235 - Posted: 31 Dec 2020, 19:52:47 UTC - in response to Message 63092.  

I've had several of these across a few PCs. None lately though. I'm not sure whether there is some congestion/contention on the server when the reports happen, or what. Maybe 2% of my completed models have done this. I don't think there is anything you can do about it. I've had it happen a few times when a task reported around the time that the credit script is running on the server. Yours may have transmitted in that window (which is weekly, late Wed/early Thu UK time).

That task will time out in a year and may be reissued then, which is probably ridiculous. The only way for it to be reissued from that work unit before that is if you detach and then reattach the computer to cpdn. Of course you wouldn't want any cpdn tasks on that PC at the time you detached/reattached. It's up to you if you want to do that. No big deal if you don't. The completed (but not acknowledged as completed by the server) task will be labelled as abandoned on your list of tasks after the detach/reattach and a new task from that work unit would be ready to go out to another computer. Like I said, it's completely up to you if you want to go through that process.

Sorry about that...it's a bummer when it happens.


Is there no true handshake? When the client fails to receive an ack it should resend the WU (and, naturally, the server should be able to deal with duplicates already).
ID: 63235 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63236 - Posted: 31 Dec 2020, 20:28:40 UTC - in response to Message 63235.  

Resend isn't used here because of the nature of climate models.
If a computer can't get it right the first time, for whatever reason, then it's put back into the queue for the next computer to try.
Of which there are thousands, so it's not a big deal for the researchers.
ID: 63236 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 149
Credit: 12,830,559
RAC: 228
Message 63237 - Posted: 31 Dec 2020, 20:35:48 UTC - in response to Message 63236.  

Resend isn't used here because of the nature of climate models.
If a computer can't get it right the first time, for whatever reason, then it's put back into the queue for the next computer to try.
Of which there are thousands, so it's not a big deal for the researchers.


Fair enough, it’s not a problem I’ve ever encountered - although blaming it on the client when the example given was a server fault ... 😛
ID: 63237 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63238 - Posted: 31 Dec 2020, 22:03:22 UTC - in response to Message 63237.  

OK, in that example, the process is way beyond the point of where a Resend is needed.
All of the data has reached the various servers, and is being dealt with; then one of the servers has a sneezing attack, and doesn't see one packet of data being moved past it, so the packet doesn't get ticked off on a task page.
ID: 63238 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 149
Credit: 12,830,559
RAC: 228
Message 63239 - Posted: 31 Dec 2020, 23:18:33 UTC - in response to Message 63238.  

OK, in that example, the process is way beyond the point of where a Resend is needed.
All of the data has reached the various servers, and is being dealt with; then one of the servers has a sneezing attack, and doesn't see one packet of data being moved past it, so the packet doesn't get ticked off on a task page.


OK, I submit - too much work for too small a problem :-)
ID: 63239 · Report as offensive     Reply Quote

Message boards : Number crunching : w/u completed but still showing as "Server state In progress"

©2024 cpdn.org