climateprediction.net (CPDN) home page
Thread 'Download Errors: Permanent HTTP -- Euro Region Tasks'

Thread 'Download Errors: Permanent HTTP -- Euro Region Tasks'

Message boards : Number crunching : Download Errors: Permanent HTTP -- Euro Region Tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46662 - Posted: 21 Jul 2013, 22:36:52 UTC

Four download errors recently for UK Met Office HADAM3P European Region v6.09 tasks. All show similar messages in BOINC manager:

climateprediction.net 7/21/2013 6:17:22 PM Giving up on download of hadam3p_eu_2lj2_1970_1_008118209.zip: permanent HTTP error
ID: 46662 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,827,799
RAC: 5,038
Message 46664 - Posted: 21 Jul 2013, 22:49:22 UTC

These look like automated re-issues of old models (August 2012), for which some of the download files are no longer available. The project team have looked several times at this problem of "ancient" models reappearing and not been able to stop it.

My advice is to look at the work unit date. If it is very old compared with the model run time (i.e. a few days for HADAM3P) and the download fails then just ignore it ...
ID: 46664 · Report as offensive     Reply Quote
Profileritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 46665 - Posted: 21 Jul 2013, 22:55:47 UTC - in response to Message 46664.  

These look like automated re-issues of old models (August 2012)...My advice is to look at the work unit date...

Thanks for the feedback, Iain. You are right -- all four created 8 Aug 2012. I haven't worked this project regularly for quite some time and am just curious... How common is this? Could it be related to the recent outage somehow?
ID: 46665 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46667 - Posted: 21 Jul 2013, 23:07:47 UTC - in response to Message 46662.  

It's all caused by a long period timer somewhere in the BOINC server code.
When the timer reaches the maximum value for that variable type, it overflows to zero, and BOINC thinks that it's a new data set, so it tries to issue it.
But all of the support files have long since been removed from the associated server, so BOINC can't find them to download them.

ID: 46667 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,827,799
RAC: 5,038
Message 46668 - Posted: 21 Jul 2013, 23:15:38 UTC - in response to Message 46665.  

It isn't related to the recent outage as far as I know. In a sense BOINC is working as expected: a task expires having done nothing, then another task is issued for another computer to see whether it can do any better. But because the HADAM3P deadlines are so long (and not used by the project), some of the download files have been removed. The project staff have tried in the past to mark batches as "not for re-issue" but even then the models re-appear.

Since BOINC itself is unlikely to be the problem, it is probably some unexpected consequence of a CPDN-specific change somewhere. My impression is that the project staff are trying to get back to "vanilla" BOINC, but progress is painfully slow. In that limited sense, there may be a connection with the recent outage, which was caused by changes made as part of that standardisation process.
ID: 46668 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,498,085
RAC: 21,454
Message 46671 - Posted: 22 Jul 2013, 12:45:36 UTC - in response to Message 46667.  

It's all caused by a long period timer somewhere in the BOINC server code.
When the timer reaches the maximum value for that variable type, it overflows to zero, and BOINC thinks that it's a new data set, so it tries to issue it.

It's not an overflow, the BOINC server-code includes a security-measure in case the server somehow has overlooked a task. The security-measure kicks-in if a task hits 1.5 times it's deadline and this triggers a re-check verifying if wu is finished or if a new task is neccessary.

Since CPDN isn't archiving "done" wu and removing these from database like other BOINC-projects normally is doing, you'll continue having this problem with CPDN re-issuing ancient wu's until they're hitting any of their max-limits (max error/total).

If not mis-remembers with more resent server-code it's possible to disable the re-issue when hitting the security-limit.
ID: 46671 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46677 - Posted: 22 Jul 2013, 23:20:35 UTC - in response to Message 46671.  

OK thanks for that. It's been around for so long that I'd forgotten the details. And the discussion was on the defunct php board.
The cure requires a more up to date server version, which will be the full blown vanilla BOINC version 7. Sometime in the future.

ID: 46677 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 47078 - Posted: 17 Sep 2013, 8:00:05 UTC

Reading through this thread tells me that these units are still being re-issued. Shame as it means my netbook still has nothing but WCG tasks to crunch. :(
ID: 47078 · Report as offensive     Reply Quote

Message boards : Number crunching : Download Errors: Permanent HTTP -- Euro Region Tasks

©2024 cpdn.org