climateprediction.net (CPDN) home page
Thread 'Weather at Home still running? Can't send back files.'

Thread 'Weather at Home still running? Can't send back files.'

Message boards : Number crunching : Weather at Home still running? Can't send back files.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 67916 - Posted: 20 Jan 2023, 1:31:53 UTC

I got a retread which was originally sent a few months ago, but it's not uploading files for days. Any point completing it?

Temporarily failed upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_1.zip: transient HTTP error
ID: 67916 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 67918 - Posted: 20 Jan 2023, 6:15:41 UTC - in response to Message 67916.  

Looks like another message to Andy required who will then nudge those who manage the server in Hobart, Tasmania. This server is known to have frequent problems and is managed by a data server on behalf of the scientists, not by the project in Oxford.
ID: 67918 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 67927 - Posted: 20 Jan 2023, 20:46:21 UTC
Last modified: 20 Jan 2023, 21:23:25 UTC

The worrying thing is it's trying to send zips 1, 2, 7, 8, 9. I hope 3 to 6 magically got through and weren't lost somewhere.

Correction, I forgot there was a long term log, it seems some got lucky.

13-Jan-2023 20:12:17 [climateprediction.net] Started upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_3.zip
13-Jan-2023 20:14:27 [climateprediction.net] Finished upload of wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_3.zip
ID: 67927 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 68018 - Posted: 24 Jan 2023, 18:15:46 UTC - in response to Message 67927.  

Have the zips been able to get through yet. Andy has checked and the server appears to be responding normally and isn't full?
ID: 68018 · Report as offensive     Reply Quote
Ryusennin
Avatar

Send message
Joined: 7 Dec 06
Posts: 3
Credit: 968,530
RAC: 333
Message 68110 - Posted: 29 Jan 2023, 23:34:33 UTC

2023-01-30 00:17:00 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip
2023-01-30 00:17:00 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip
2023-01-30 00:22:07 | climateprediction.net | Temporarily failed upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip: transient HTTP error
2023-01-30 00:22:07 | climateprediction.net | Backing off 04:28:36 on upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_18.zip
2023-01-30 00:22:07 | climateprediction.net | Temporarily failed upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip: transient HTTP error
2023-01-30 00:22:07 | climateprediction.net | Backing off 03:57:56 on upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_19.zip
2023-01-30 00:22:07 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_21.zip
2023-01-30 00:22:07 | climateprediction.net | Started upload of wah2_nz25_a3gq_201505_25_936_012154252_0_r1376385003_23.zip
2023-01-30 00:22:09 | | Project communication failed: attempting access to reference site
2023-01-30 00:22:10 | | Internet access OK - project servers may be temporarily down.

It's been stuck for days. I'm on optic fiber.
ID: 68110 · Report as offensive     Reply Quote
ProfileThomas McFarland
Avatar

Send message
Joined: 28 Feb 05
Posts: 20
Credit: 11,168,717
RAC: 18,157
Message 68111 - Posted: 29 Jan 2023, 23:37:39 UTC - in response to Message 68018.  

I also have 3 WAH WU's running on a Windows computer and I haven't seen it upload anything in days. Is this the same issue that is affecting OpenIFS? I seem to be having at least some luck with that.
ID: 68111 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 68112 - Posted: 30 Jan 2023, 1:11:28 UTC - in response to Message 68111.  

I also have 3 WAH WU's running on a Windows computer and I haven't seen it upload anything in days. Is this the same issue that is affecting OpenIFS? I seem to be having at least some luck with that.


Those are uploaded to a different server in Hobart Tasmania. Over the years, it has been periodically unreliable. I e-mailed Andy so hopefully he can communicate with them and get this resolved.
ID: 68112 · Report as offensive     Reply Quote
ProfileThomas McFarland
Avatar

Send message
Joined: 28 Feb 05
Posts: 20
Credit: 11,168,717
RAC: 18,157
Message 68113 - Posted: 30 Jan 2023, 2:06:36 UTC - in response to Message 68112.  

Hmm. I didn't think it was, which is why I asked. Strange to be having the same problem at the same time. The probability of that seems rather low.
ID: 68113 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 68117 - Posted: 30 Jan 2023, 11:34:36 UTC

This was raised in the tech meeting with CPDN this morning. They are in contact with the scientist in NZ about this. I might be intermittent as when CPDN check, the server seems to be accepting uploads. However, they are going to look at redirecting uploads to a different server in the UK and then send the data to NZ after uploads have been complete.

If this is still happening in a couple of days, report here and me or one of the moderators can report back to CPDN directly.
ID: 68117 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 68202 - Posted: 4 Feb 2023, 16:52:29 UTC
Last modified: 4 Feb 2023, 16:52:59 UTC

I see some Windows tasks have completed in the last couple days. Has anyone in this thread reporting upload problems for WAH2 NZ tasks had their tasks upload?
ID: 68202 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68203 - Posted: 4 Feb 2023, 16:59:01 UTC
Last modified: 4 Feb 2023, 17:01:44 UTC

My files have all gone through. It was taking 30 attempts for each trickle, and had about 4 queued at any one time (on a slow machine doing just one task).

But I just checked the task on the server and it says no trickles? It's 85.545% complete here, and has sent 21 trickles, the last one being wah2_nz25_a0ha_199005_25_936_012150384_1_r1101737875_21.zip

https://www.cpdn.org/result.php?resultid=22296415
ID: 68203 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 68204 - Posted: 4 Feb 2023, 17:09:05 UTC

There have been intermittent problems with the NZ upload server. Andy told me he was looking at it last week - CPDN are in touch with the NZ project scientist about switching to another server. So they know about the problem and are monitoring it.
ID: 68204 · Report as offensive     Reply Quote
David Berg

Send message
Joined: 2 Jul 15
Posts: 21
Credit: 4,210,483
RAC: 1,526
Message 68433 - Posted: 24 Feb 2023, 8:17:23 UTC

I have seen no wah tasks come across to my system. Could something be blocking them? Thx.
ID: 68433 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68434 - Posted: 24 Feb 2023, 9:00:00 UTC

What's blocking it is the lack of work by the scientists producing them.
ID: 68434 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 68436 - Posted: 24 Feb 2023, 10:47:15 UTC - in response to Message 68433.  

I have seen no wah tasks come across to my system. Could something be blocking them? Thx.
As Peter says, there have been no WAH batches of work for some time which means no native work for Windows. It is possible to get work when it appears by using WSL or Open Box to run a Linux distribution such as Ubuntu and then install BOINC in that. The next work is most likely to be OpenIFS which is for Linux and requires about 8GB/core before taking overheads off. A bit less if on a very lean system. The batch I am talking about will be a rerun of the most recent batch which were all failing due most probably to an error in the task data files.
ID: 68436 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69999 - Posted: 28 Oct 2023, 3:32:01 UTC

NZ server needs kicking. Cannot upload.
ID: 69999 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 70000 - Posted: 28 Oct 2023, 6:52:59 UTC - in response to Message 69999.  

NZ server needs kicking. Cannot upload.

That will be Monday then when Andy is back in to Message Hobart.
ID: 70000 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70001 - Posted: 28 Oct 2023, 14:54:01 UTC - in response to Message 70000.  

NZ server needs kicking. Cannot upload.
That will be Monday then when Andy is back in to Message Hobart.
I take it they still want the task? It's a retread originally sent on 11th July:
https://www.cpdn.org/workunit.php?wuid=12221085

While looking, I noticed a mess. One of my two xeon machines corrupted it's hard disk, so I cloned it from the other one, assuming Boinc would eventually sort things out. Worst case scenario, things would be run twice, and I thought if one crashed a task, the other would succeed. But what's actually happened is (between these two of my machines) your server thinks I've got 6 tasks in progress, I only have 2 (on those machines, I have others), nothing I could do about the ones which were on the corrupted disk, that machine couldn't boot up to tell the server. If possible, could you cancel the others so they get sent out again? Otherwise they're going to be allocated to me for a year.

https://www.cpdn.org/results.php?hostid=1544690&offset=0&show_names=0&state=1&appid=

I'm running WU 12221085 and another WU not listed! I assume that one was running on both machines, but I don't know how they both lost it with the server thinking I still have it.

WUs 12228405, 12228965, 12224132, 12229600, 12224528 are not on either of my xeon machines. The best thing presumably is if you can cancel those, and I'll cancel the one I've got which your server doesn't know about?
ID: 70001 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 70002 - Posted: 28 Oct 2023, 15:52:41 UTC - in response to Message 70001.  

Did you clone the CPDN HostID when you cloned the hard disk? Server databases don't like duplicated primary key entries in their tables.

Bear that in mind next time.
ID: 70002 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70003 - Posted: 28 Oct 2023, 17:30:38 UTC - in response to Message 70002.  
Last modified: 28 Oct 2023, 17:39:07 UTC

Did you clone the CPDN HostID when you cloned the hard disk? Server databases don't like duplicated primary key entries in their tables.

Bear that in mind next time.
I just cloned the entire disk. Clone software doesn't really have an option to change Boinc settings.

I couldn't remember the complex way Boinc deals with such things, I thought it noticed a duplicate, decided which was the new one, and started fresh with it. So I should delete the ID and let it make another? I guess I'd have to stop Boinc auto starting (not even sure where that setting is), then do the clone, then change the host ID, then put Boinc back on auto-start? Then what happens to the tasks already on (now both) machines? This all sounds very complicated.

Couldn't Boinc notice when it boots up the hardware has changed and ask me if I want to change ID? If I'd just done an upgrade, I could say no. If it was a clone, I could say so, then Boinc could wipe the clone's queue and start with a fresh ID.

Perhaps:
Turn off network communication in Boinc.
Make clone.
Turn on network communication in original.
On clone, delete all tasks then shut down Boinc.
Remove ID from config file.
Start up Boinc and turn on network communication.

However the main problem I ended up with was the tasks on the corrupted disk. There's no way I can tell the server to hand them out to someone else, or does the "new" ID match up with the corrupt disk ID and the server realises I've "misplaced" the tasks? We're back to this 1 year deadline problem, I thought they were going to fix it?
ID: 70003 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Weather at Home still running? Can't send back files.

©2024 cpdn.org