climateprediction.net (CPDN) home page
Thread 'Batch 996 Weather@Home2 East Asia25'

Thread 'Batch 996 Weather@Home2 East Asia25'

Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69741 - Posted: 10 Oct 2023, 14:24:17 UTC - in response to Message 69736.  

I did that after I read your message but didn't find anything? Heard back from Andy he's never seen that before.
@Glenn

I should have said

using the Advanced search link at the top of the forum,

that is how you would get to the search I was talking about.

---
CPDN Visiting Scientist
ID: 69741 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,704,964
RAC: 9,670
Message 69742 - Posted: 10 Oct 2023, 14:45:42 UTC - in response to Message 69741.  

Thread 7592, specifically message 46161?
ID: 69742 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69744 - Posted: 10 Oct 2023, 16:09:16 UTC - in response to Message 69742.  

Yes, I saw that one. But all it tells me is there's something wrong with writing files to a device. I/O should be buffered normally but there are places in the code where it tries to force a flush. But even that is only a hint to the OS which can choose to ignore it. If it was me getting those errors, I'd check the device health as a first step.
Thread 7592, specifically message 46161?
ID: 69744 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 69745 - Posted: 10 Oct 2023, 16:13:18 UTC - in response to Message 69713.  

FWIW, I have 7 zips that cannot upload. "transient HTTP error"
Andy's just informed me that he's restarted the httpd server on the Korean machine. It was running & not out of space, but rather alot of uploads and most likely stale connections. Hope that's got stuck uploads moving again.

If it misbehaves again, pls post it here.


I have 9 now stuck. I just now tried to upload, with no success:

1182081	climateprediction.net	10/10/2023 9:08:03 AM	Started upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip	
1182214	climateprediction.net	10/10/2023 9:08:52 AM	Temporarily failed upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip: transient HTTP error	
1182215	climateprediction.net	10/10/2023 9:08:52 AM	Backing off 05:00:54 on upload of wah2_eas25_a2h5_200112_24_996_012226757_2_r1545933914_8.zip	
1182216			10/10/2023 9:08:53 AM	Project communication failed: attempting access to reference site	
1182217			10/10/2023 9:08:54 AM	Internet access OK - project servers may be temporarily down.	
ID: 69745 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 69747 - Posted: 10 Oct 2023, 19:43:00 UTC
Last modified: 10 Oct 2023, 19:43:33 UTC

I have let the project know along with the crucial extract from an event log. See this post.
ID: 69747 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 69750 - Posted: 10 Oct 2023, 20:28:09 UTC - in response to Message 69734.  

Thanks - I'll put that idea in the red-herring bin
ID: 69750 · Report as offensive     Reply Quote
ChelseaOilman

Send message
Joined: 24 Dec 19
Posts: 32
Credit: 40,998,402
RAC: 76,535
Message 69754 - Posted: 10 Oct 2023, 22:07:44 UTC - in response to Message 69745.  

[quote][quote]I have 9 now stuck. I just now tried to upload, with no success:

I feel your pain. I've resigned myself to the reality this issue isn't going to get fixed.
ID: 69754 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 69755 - Posted: 11 Oct 2023, 5:38:31 UTC
Last modified: 11 Oct 2023, 5:48:25 UTC

I have 9 now stuck. I just now tried to upload, with no success:
What I would really like to understand is why some don't seem to have any problems. Is it just random or is there a pattern that neither I nor anyone else are seeing?

Edit: My message has been passed from the researcher to those maintaining the server. Unfortunately, this issue may be so esoteric that it might not help much.
ID: 69755 · Report as offensive     Reply Quote
bibi

Send message
Joined: 22 Dec 08
Posts: 7
Credit: 21,872,556
RAC: 27,990
Message 69756 - Posted: 11 Oct 2023, 6:24:18 UTC

From germany. I make them count:
$ for host in r r2 r5 pc; do echo "$host $(bnc $host --get_file_transfers | grep -c wah2)"; done
r 48
r2 42
r5 38
pc 3

Sometimes I see a few uploads, but mostly stuck while uploading.
ID: 69756 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 69758 - Posted: 11 Oct 2023, 11:09:06 UTC
Last modified: 11 Oct 2023, 11:25:21 UTC

From the researcher
Hi Dave, a IT staff told me there isn’t any change in bandwidth settings (port 50000-51000) for Korean server , so it should be (physically) open for any user as usual. Since he wants to investigate further, could you provide me more information? I will hand it over to him. *Uploader information having an issue: e.g., its IP address, a upload date, an intended port number, etc.
ID: 69758 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,704,964
RAC: 9,670
Message 69762 - Posted: 11 Oct 2023, 11:49:06 UTC - in response to Message 69758.  

See my comments in the Windows upload thread.
ID: 69762 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 69763 - Posted: 11 Oct 2023, 12:39:47 UTC - in response to Message 69762.  

I have posted your further comments on the Trello board. Richard. My own uploads are all still getting through with only the occasional transient http error from a bored band connection maxing out at about 120Kb/second with a following wind.
ID: 69763 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 69765 - Posted: 11 Oct 2023, 13:27:17 UTC

In addition to 12 zip files, I now have a completed task that cannot upload.
ID: 69765 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,445,900
RAC: 24,084
Message 69768 - Posted: 11 Oct 2023, 14:50:21 UTC

Since I apparently overlooked a Windows 10 update, 15 tasks crapped out after the unexpected re-boot.

14 errored-out with "Signal 11 received: Segment violation" but one of them strangely enough also had "The system cannot find the drive specified. (0xf) - exit code 15 (0xf)"

One of them had "The access code is invalid. (0xc) - exit code 12 (0xc)"

All of them had at least 1 trickle, meaning it's not wu's that errored-out at the initial startup.
ID: 69768 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 69769 - Posted: 11 Oct 2023, 14:52:12 UTC

I've experienced an unexpected power failure today. My two hosts were both running an EAS task while this happened. Both seem to have survived the abnormal shutdown and are now crunching ahead.

The win10 host task survived also the patch Tuesday restart earlier this morning.
ID: 69769 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,984,181
RAC: 14,575
Message 69773 - Posted: 11 Oct 2023, 22:13:55 UTC
Last modified: 11 Oct 2023, 22:15:03 UTC

Lost 9 of my 12 tasks following a "planned" reboot. 3 unexplained but the rest all sig 11 seg violation. One resend picked up this morning also failed sig 11 seg violation.
ID: 69773 · Report as offensive     Reply Quote
ChelseaOilman

Send message
Joined: 24 Dec 19
Posts: 32
Credit: 40,998,402
RAC: 76,535
Message 69774 - Posted: 11 Oct 2023, 23:47:46 UTC

What are these restart zip files I'm seeing?
ID: 69774 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 69775 - Posted: 12 Oct 2023, 0:34:51 UTC - in response to Message 69712.  
Last modified: 12 Oct 2023, 0:36:43 UTC

Well all three of my tasks crashed after uploading 10 trickles each. My machine got another task and it crashed after uploading a single trickle.
I cannot tell what really went wrong with any of them.

My machine is
Computer ID 1512658, and the tasks were:

22340449
22339081
22339022
22346116
ID: 69775 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 69776 - Posted: 12 Oct 2023, 6:11:01 UTC - in response to Message 69774.  

What are these restart zip files I'm seeing?
They are files generated by a lot of CPDN tasks thatI think can be used to generate further tasks. They don't however always get used. More often than not they are generated at the end of a task rather than half way through as in this and the previous batch.
ID: 69776 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 69778 - Posted: 12 Oct 2023, 8:38:35 UTC

While I was out last night another signal 11 arrived and departed.
https://www.cpdn.org/result.php?resultid=22346439

At the same time three other tasks continued the long plod towards completion.

Haul of failure since 5th October
SIGNAL 11 = 6 (runtime ~ 2 minutes)
"restart" failure/ signal 11 = 6 (runtime >3 minutes)
(One of these https://www.cpdn.org/result.php?resultid=22337980 was not associated with a shutdown/restart cycle, but failed ~20 minutes after first start.)

Only 3 tasks of the 15 received have any chance of reaching completion, I'll keep the PC (and BOINC) running until they have finished.
ID: 69778 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25

©2024 cpdn.org