Message boards : Number crunching : Weather at Home still running? Can't send back files.
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I got a retread which was originally sent a few months ago, but it's not uploading files for days. Any point completing it? Temporarily failed upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_1.zip: transient HTTP error |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
Looks like another message to Andy required who will then nudge those who manage the server in Hobart, Tasmania. This server is known to have frequent problems and is managed by a data server on behalf of the scientists, not by the project in Oxford. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
The worrying thing is it's trying to send zips 1, 2, 7, 8, 9. I hope 3 to 6 magically got through and weren't lost somewhere. Correction, I forgot there was a long term log, it seems some got lucky. 13-Jan-2023 20:12:17 [climateprediction.net] Started upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_3.zip 13-Jan-2023 20:14:27 [climateprediction.net] Finished upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_3.zip |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
Have the zips been able to get through yet. Andy has checked and the server appears to be responding normally and isn't full? |
Send message Joined: 7 Dec 06 Posts: 3 Credit: 968,530 RAC: 333 |
2023-01-30 00:17:00 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip 2023-01-30 00:17:00 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip 2023-01-30 00:22:07 | climateprediction.net | Temporarily failed upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip: transient HTTP error 2023-01-30 00:22:07 | climateprediction.net | Backing off 04:28:36 on upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip 2023-01-30 00:22:07 | climateprediction.net | Temporarily failed upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip: transient HTTP error 2023-01-30 00:22:07 | climateprediction.net | Backing off 03:57:56 on upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip 2023-01-30 00:22:07 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_21.zip 2023-01-30 00:22:07 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_23.zip 2023-01-30 00:22:09 | | Project communication failed: attempting access to reference site 2023-01-30 00:22:10 | | Internet access OK - project servers may be temporarily down. It's been stuck for days. I'm on optic fiber. |
Send message Joined: 28 Feb 05 Posts: 20 Credit: 11,168,717 RAC: 18,157 |
I also have 3 WAH WU's running on a Windows computer and I haven't seen it upload anything in days. Is this the same issue that is affecting OpenIFS? I seem to be having at least some luck with that. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I also have 3 WAH WU's running on a Windows computer and I haven't seen it upload anything in days. Is this the same issue that is affecting OpenIFS? I seem to be having at least some luck with that. Those are uploaded to a different server in Hobart Tasmania. Over the years, it has been periodically unreliable. I e-mailed Andy so hopefully he can communicate with them and get this resolved. |
Send message Joined: 28 Feb 05 Posts: 20 Credit: 11,168,717 RAC: 18,157 |
Hmm. I didn't think it was, which is why I asked. Strange to be having the same problem at the same time. The probability of that seems rather low. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
This was raised in the tech meeting with CPDN this morning. They are in contact with the scientist in NZ about this. I might be intermittent as when CPDN check, the server seems to be accepting uploads. However, they are going to look at redirecting uploads to a different server in the UK and then send the data to NZ after uploads have been complete. If this is still happening in a couple of days, report here and me or one of the moderators can report back to CPDN directly. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I see some Windows tasks have completed in the last couple days. Has anyone in this thread reporting upload problems for WAH2 NZ tasks had their tasks upload? |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
My files have all gone through. It was taking 30 attempts for each trickle, and had about 4 queued at any one time (on a slow machine doing just one task). But I just checked the task on the server and it says no trickles? It's 85.545% complete here, and has sent 21 trickles, the last one being wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_21.zip https://www.cpdn.org/result.php?resultid=22296415 |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
There have been intermittent problems with the NZ upload server. Andy told me he was looking at it last week - CPDN are in touch with the NZ project scientist about switching to another server. So they know about the problem and are monitoring it. |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,210,483 RAC: 1,526 |
I have seen no wah tasks come across to my system. Could something be blocking them? Thx. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
What's blocking it is the lack of work by the scientists producing them. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
I have seen no wah tasks come across to my system. Could something be blocking them? Thx.As Peter says, there have been no WAH batches of work for some time which means no native work for Windows. It is possible to get work when it appears by using WSL or Open Box to run a Linux distribution such as Ubuntu and then install BOINC in that. The next work is most likely to be OpenIFS which is for Linux and requires about 8GB/core before taking overheads off. A bit less if on a very lean system. The batch I am talking about will be a rerun of the most recent batch which were all failing due most probably to an error in the task data files. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
NZ server needs kicking. Cannot upload. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
NZ server needs kicking. Cannot upload. That will be Monday then when Andy is back in to Message Hobart. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I take it they still want the task? It's a retread originally sent on 11th July:NZ server needs kicking. Cannot upload.That will be Monday then when Andy is back in to Message Hobart. https://www.cpdn.org/workunit.php?wuid=12221085 While looking, I noticed a mess. One of my two xeon machines corrupted it's hard disk, so I cloned it from the other one, assuming Boinc would eventually sort things out. Worst case scenario, things would be run twice, and I thought if one crashed a task, the other would succeed. But what's actually happened is (between these two of my machines) your server thinks I've got 6 tasks in progress, I only have 2 (on those machines, I have others), nothing I could do about the ones which were on the corrupted disk, that machine couldn't boot up to tell the server. If possible, could you cancel the others so they get sent out again? Otherwise they're going to be allocated to me for a year. https://www.cpdn.org/results.php?hostid=1544690&offset=0&show_names=0&state=1&appid= I'm running WU 12221085 and another WU not listed! I assume that one was running on both machines, but I don't know how they both lost it with the server thinking I still have it. WUs 12228405, 12228965, 12224132, 12229600, 12224528 are not on either of my xeon machines. The best thing presumably is if you can cancel those, and I'll cancel the one I've got which your server doesn't know about? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,706,621 RAC: 9,524 |
Did you clone the CPDN HostID when you cloned the hard disk? Server databases don't like duplicated primary key entries in their tables. Bear that in mind next time. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Did you clone the CPDN HostID when you cloned the hard disk? Server databases don't like duplicated primary key entries in their tables.I just cloned the entire disk. Clone software doesn't really have an option to change Boinc settings. I couldn't remember the complex way Boinc deals with such things, I thought it noticed a duplicate, decided which was the new one, and started fresh with it. So I should delete the ID and let it make another? I guess I'd have to stop Boinc auto starting (not even sure where that setting is), then do the clone, then change the host ID, then put Boinc back on auto-start? Then what happens to the tasks already on (now both) machines? This all sounds very complicated. Couldn't Boinc notice when it boots up the hardware has changed and ask me if I want to change ID? If I'd just done an upgrade, I could say no. If it was a clone, I could say so, then Boinc could wipe the clone's queue and start with a fresh ID. Perhaps: Turn off network communication in Boinc. Make clone. Turn on network communication in original. On clone, delete all tasks then shut down Boinc. Remove ID from config file. Start up Boinc and turn on network communication. However the main problem I ended up with was the tasks on the corrupted disk. There's no way I can tell the server to hand them out to someone else, or does the "new" ID match up with the corrupt disk ID and the server realises I've "misplaced" the tasks? We're back to this 1 year deadline problem, I thought they were going to fix it? |
©2024 cpdn.org