Message boards :
Number crunching :
w/u completed but still showing as "Server state In progress"
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
So I sat and watched a w/u complete...... sad I know. It uploaded the 5th trickle ok. Then came to the end of processing and produced a small file to upload, which uploaded ok. The w/u status changed Ready to report ?. Which it did. All very satisfying after 20 days. We dont all have super computers..... But a day later the w/u is still showing as "In progress" on the "tasks for computer" page. https://www.cpdn.org/result.php?resultid=21957285 The computer it came from only has 1 cpdn task left to do before its harddisk is upgraded. Has the server just not caught up yet or is it another fiendish way of losing a w/u at 100% Ta Nairb |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
I've had several of these across a few PCs. None lately though. I'm not sure whether there is some congestion/contention on the server when the reports happen, or what. Maybe 2% of my completed models have done this. I don't think there is anything you can do about it. I've had it happen a few times when a task reported around the time that the credit script is running on the server. Yours may have transmitted in that window (which is weekly, late Wed/early Thu UK time). That task will time out in a year and may be reissued then, which is probably ridiculous. The only way for it to be reissued from that work unit before that is if you detach and then reattach the computer to cpdn. Of course you wouldn't want any cpdn tasks on that PC at the time you detached/reattached. It's up to you if you want to do that. No big deal if you don't. The completed (but not acknowledged as completed by the server) task will be labelled as abandoned on your list of tasks after the detach/reattach and a new task from that work unit would be ready to go out to another computer. Like I said, it's completely up to you if you want to go through that process. Sorry about that...it's a bummer when it happens. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
Thanks for the answer..... So it seems there is another way to lose a w/u at 100%. Without any warning a w/u can be wasted on the last "ready to report" communication. And all is lost................... along with any remaining sense of satisfaction which is all there really is from participating in these projects. I will let the remaining 5 run to completion. Who knows - a couple of w/u might actually end up being useful. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The last zip upload should have gone to the server OK, so all is not lost. It's just that the completion will not be registered on your tasks page, so you have to remember that mentally. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
The last zip upload should have gone to the server OK, so all is not lost. Ok, so the w/u should be valid for the scientists but just shows "in progress" on the tasks page. Which is a better outcome. It must be possible to run a script to update the tasks page. But if the task has finished ok I guess there is little incentive to do this. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
I've had several of these across a few PCs. None lately though. I'm not sure whether there is some congestion/contention on the server when the reports happen, or what. Maybe 2% of my completed models have done this. I don't think there is anything you can do about it. I've had it happen a few times when a task reported around the time that the credit script is running on the server. Yours may have transmitted in that window (which is weekly, late Wed/early Thu UK time). Is there no true handshake? When the client fails to receive an ack it should resend the WU (and, naturally, the server should be able to deal with duplicates already). |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Resend isn't used here because of the nature of climate models. If a computer can't get it right the first time, for whatever reason, then it's put back into the queue for the next computer to try. Of which there are thousands, so it's not a big deal for the researchers. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
Resend isn't used here because of the nature of climate models. Fair enough, it’s not a problem I’ve ever encountered - although blaming it on the client when the example given was a server fault ... 😛 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, in that example, the process is way beyond the point of where a Resend is needed. All of the data has reached the various servers, and is being dealt with; then one of the servers has a sneezing attack, and doesn't see one packet of data being moved past it, so the packet doesn't get ticked off on a task page. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
OK, in that example, the process is way beyond the point of where a Resend is needed. OK, I submit - too much work for too small a problem :-) |
©2024 cpdn.org