Message boards : Number crunching : New work discussion - 2
Message board moderation
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · 33 · 34 . . . 42 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4536 Credit: 18,993,249 RAC: 21,753 |
A quick note to say that George, one of my fellow mods has pointed out to me that the NZ testing task I was running was to compare with the EAS ones and will not be resulting in work. If I was paying more attention I might have twigged that myself. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Chaps, this is getting very off topic. Can we please keep it on point? We want information easy to find on these forums. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4536 Credit: 18,993,249 RAC: 21,753 |
Thanks Glen. I have moved a bunch of posts including some of my own to the off topics thread in the cafe section of the forums. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
These 3 WUs started at the same time and they're all at 36-38% progress. They're all running on the same Win7 i5-4690K quadcore CPU with nothing else running. I've restarted BOINC twice and the first was to upgrade to 7.22.2. Set to only allow a single file transfer. Might there be a clue in one working properrly and two not or is just random? wah2_eas25_a21o_200211_25_994_012218090_0 failed with an error while computing. wah2_eas25_a23h_200211_25_994_012218155_0 is still running nicely with no hung ULs. wah2_eas25_a342_201111_25_994_012219472_0 ULed _12.zip and _restart.zip today but _4.zip still refuses to UL. No feedback says they'll just run to failure with no fix forthcoming. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
[quote]These 3 WUs started at the same time and they're all at 36-38% progress. They're all running on the same Win7 i5-4690K quadcore CPU with nothing else running. I've restarted BOINC twice and the first was to upgrade to 7.22.2. Set to only allow a single file transfer.wah2_eas25_a23h_200211_25_994_012218155_0 now has _12.zip and _restart.zip hung. Looks like all 3 of my WUs will be failures. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,967,615 RAC: 14,422 |
Looks like the server is down again. 05/07/2023 07:43:48 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error) 05/07/2023 07:43:48 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error) 05/07/2023 07:43:48 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error) 05/07/2023 07:43:48 | climateprediction.net | Temporarily failed upload of wah2_eas25_a1t4_200011_25_994_012217782_2_r734268285_25.zip: transient HTTP error 05/07/2023 07:43:48 | climateprediction.net | [file_xfer] project-wide upload delay for 13849.787896 sec 05/07/2023 07:43:48 | climateprediction.net | Backing off 03:46:21 on upload of wah2_eas25_a1t4_200011_25_994_012217782_2_r734268285_25.zip 05/07/2023 07:43:48 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error) 05/07/2023 07:43:48 | climateprediction.net | Temporarily failed upload of wah2_eas25_a1t4_200011_25_994_012217782_2_r734268285_out.zip: transient HTTP error 05/07/2023 07:43:48 | climateprediction.net | Backing off 03:28:21 on upload of wah2_eas25_a1t4_200011_25_994_012217782_2_r734268285_out.zip 05/07/2023 07:43:50 | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 15 May 09 Posts: 4536 Credit: 18,993,249 RAC: 21,753 |
Looks like the server is down again.Working again for me at least. If still a problem, could you enable http debug in the event log options and check that they are going to the same server. We did have at least one batch a long time ago when the last zip and the out file went to a different server. |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
05.07.2023 19:15:08 | climateprediction.net | Started upload of wah2_eas25_a322_201011_25_994_012219400_0_r1988609399_restart.zip 05.07.2023 19:15:09 | climateprediction.net | [http] [ID#36955] Info: Connection 12581 seems to be dead 05.07.2023 19:15:09 | climateprediction.net | [http] [ID#36955] Info: Closing connection 12581 05.07.2023 19:15:09 | climateprediction.net | [http] [ID#36955] Info: schannel: shutting down SSL/TLS connection with root.ithena.net port 443 05.07.2023 19:15:09 | climateprediction.net | [http] [ID#36955] Info: Hostname in DNS cache was stale, zapped 05.07.2023 19:15:09 | climateprediction.net | [http] [ID#36955] Info: Trying 141.223.16.156:80... 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#12582) 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Host: upload7.cpdn.org 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept: */* 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept-Encoding: deflate, gzip 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept-Language: ru 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Content-Length: 318 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: Content-Type: application/x-www-form-urlencoded 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Sent header to server: 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Info: We are completely uploaded and fine 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: HTTP/1.1 200 OK 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: Date: Wed, 05 Jul 2023 16:28:35 GMT 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: Server: Apache/2.2.3 (CentOS) 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: Transfer-Encoding: chunked 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: Content-Type: text/plain; charset=UTF-8 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: 64 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: <data_server_reply> 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: <status>0</status> 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: <file_size>97517568</file_size> 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: </data_server_reply> 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: 0 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Received header from server: 05.07.2023 19:15:10 | climateprediction.net | [http] [ID#36955] Info: Connection #12582 to host upload7.cpdn.org left intact 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Info: Found bundle for host: 0x33e1dd0 [serially] 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Info: Re-using existing connection #12582 with host upload7.cpdn.org 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Host: upload7.cpdn.org 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept: */* 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept-Encoding: deflate, gzip 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Accept-Language: ru 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Content-Length: 37476527 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Content-Type: application/x-www-form-urlencoded 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: Expect: 100-continue 05.07.2023 19:15:11 | climateprediction.net | [http] [ID#36955] Sent header to server: 05.07.2023 19:15:12 | climateprediction.net | [http] [ID#36955] Received header from server: HTTP/1.1 100 Continue 05.07.2023 19:15:40 | climateprediction.net | [http] [ID#36955] Info: Recv failure: Connection was reset 05.07.2023 19:15:40 | climateprediction.net | [http] [ID#36955] Info: Closing connection 12582 05.07.2023 19:15:40 | climateprediction.net | [http] HTTP error: Failure when receiving data from the peer 05.07.2023 19:15:41 | | Project communication failed: attempting access to reference site 05.07.2023 19:15:41 | | [http] HTTP_OP::init_get(): https://www.google.com/ 05.07.2023 19:15:41 | climateprediction.net | Temporarily failed upload of wah2_eas25_a322_201011_25_994_012219400_0_r1988609399_restart.zip: transient HTTP error 05.07.2023 19:15:41 | climateprediction.net | Backing off 03:00:53 on upload of wah2_eas25_a322_201011_25_994_012219400_0_r1988609399_restart.zip 05.07.2023 19:15:41 | | [http] [ID#0] Info: Trying 108.177.14.147:443... 05.07.2023 19:15:41 | | [http] [ID#0] Info: Connected to www.google.com (108.177.14.147) port 443 (#12584) 05.07.2023 19:15:41 | | [http] [ID#0] Info: schannel: disabled automatic use of client certificate 05.07.2023 19:15:41 | | [http] [ID#0] Info: ALPN: offers http/1.1 05.07.2023 19:15:41 | | [http] [ID#0] Info: ALPN: server accepted http/1.1 05.07.2023 19:15:41 | | [http] [ID#0] Info: using HTTP/1.1 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: GET / HTTP/1.1 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: Host: www.google.com 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: Accept: */* 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: Accept-Language: ru 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: </project_name> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <name>wah2_eas25_a322_201011_25_994_012219400_0_r1988609399_restart.zip</name> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <nbytes>134993599.000000</nbytes> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <max_nbytes>150000000.000000</max_nbytes> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <status>1</status> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <persistent_file_xfer> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <num_retries>16</num_retries> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <first_request_time>1688441143.446878</first_request_time> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <next_request_time>1688584595.124421</next_request_time> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <time_so_far>809.837883</time_so_far> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <last_bytes_xferred>97845248.000000</last_bytes_xferred> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: <is_upload>1</is_upload> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: </persistent_file_xfer> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: </file_transfer> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: </file_transfers> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: </boinc_gui_rpc_reply> 05.07.2023 19:15:41 | | [http] [ID#0] Sent header to server: 05.07.2023 19:15:41 | | [http] [ID#0] Info: schannel: remote party requests renegotiation 05.07.2023 19:15:41 | | [http] [ID#0] Info: schannel: renegotiating SSL/TLS connection 05.07.2023 19:15:41 | | [http] [ID#0] Info: schannel: SSL/TLS connection renegotiated 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: HTTP/1.1 200 OK 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Date: Wed, 05 Jul 2023 16:15:41 GMT 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Expires: -1 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Cache-Control: private, max-age=0 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Content-Type: text/html; charset=windows-1251 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Content-Security-Policy-Report-Only: object-src 'none';base-uri 'self';script-src 'nonce-VbizMFrvu8jN0cx_pt3Knw' 'strict-dynamic' 'report-sample' 'unsafe-eval' 'unsafe-inline' https: http:;report-uri https://csp.withgoogle.com/csp/gws/other-hp 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Content-Encoding: gzip 01.01.1970 3:00:00 | | 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: X-XSS-Protection: 0 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: X-Frame-Options: SAMEORIGIN 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Set-Cookie: 1P_JAR=2023-07-05-16; expires=Fri, 04-Aug-2023 16:15:41 GMT; path=/; domain=.google.com; Secure 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Set-Cookie: AEC=Ad49MVEBPns8tmuPQiw45LesLgq20yCFJoPBHufHsCNL93nZLmdEWfvmYg; expires=Mon, 01-Jan-2024 16:15:41 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Set-Cookie: NID=511=E6vYlWfJ6uTfRmTbx-T3OSdIAc41T90rWec9L8lSJdnq2wtYjIGAfM0l2Pa5Wq46ZAfyJ5qR5d-hJ-K17sxRZym0J-DeVEq_A2H8wIBYWStqtSsVq-XIKyrdX-O1u-mwrxe9odEsDZCypfvLVJrlotpuCHCKmHTnpGTJZ-lE5tE; expires=Thu, 04-Jan-2024 16:15:41 GMT; path=/; domain=.google.com; HttpOnly 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: Transfer-Encoding: chunked 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 00000001 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 00000001 05.07.2023 19:15:41 | | 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 00000001 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 05.07.2023 19:15:41 | | [http] [ID#0] Received header from server: 00000001 05.07.2023 19:15:41 | | [http] [ID#0] Info: Connection #12584 to host www.google.com left intact 05.07.2023 19:15:42 | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 15 May 09 Posts: 4536 Credit: 18,993,249 RAC: 21,753 |
OK Still upload7. So that possibility has been ruled out and was probably a silly idea anyway as there have been tasks successfully completed and uploaded by others. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Update on EAS batch Feedback from today's CPDN tech meeting. CPDN have found issues with the batch sorting that they believe explains why some of the uploads are not completing. Due to the high number of failures with this batch it will closed today. More work will be done to understand the failures and resubmit the modified batch at a later date. --- CPDN Visiting Scientist |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
I hope that they allow the currently running work to finish. I have 3 tasks running and they need 2-5 days before they are completed. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I hope that they allow the currently running work to finish. I have 3 tasks running and they need 2-5 days before they are completed.Yes all currently running tasks are unaffected and will be awarded credit. Apologies I should have said. Previously the batch was suspended so it could have been restarted but it's apparent it's not working and so decision was made in today's meeting to close it. --- CPDN Visiting Scientist |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Yes all currently running tasks are unaffected and will be awarded credit. Apologies I should have said.I'm not going to run them just for credit. Are they still of use to you? You said before they could be used to compare with the new ones. Is this still the case? If you won't gain anything from them, I'll abort them and run those 25 cores on another project while you get the new ones ready. |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
What will happen if i abort upload and continue crunching? Will task be accepted or will it be rejected? |
Send message Joined: 15 May 09 Posts: 4536 Credit: 18,993,249 RAC: 21,753 |
What will happen if i abort upload and continue crunching?Credit will still be granted for the trickle up files sent which go to a different server and are not affected by the problems with upload7 |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
But if one file is missing, won't that make the results useless to the scientists?What will happen if i abort upload and continue crunching?Credit will still be granted for the trickle up files sent which go to a different server and are not affected by the problems with upload7 |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,735,198 RAC: 4,318 |
Work units with an incomplete result set will have to be sent out to someone else who will, hopefully, be able to run to completion. This is just the same as any other incomplete wok unit. The scientists may be able to use the results returned during any trickle-ups, but I suspect that will depend on what parts of the complete results the trickle-ups represent. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Work units with an incomplete result set will have to be sent out to someone else who will, hopefully, be able to run to completion. This is just the same as any other incomplete wok unit.As this EAS batch has been closed, if a task fails in the workunit, no more tasks from that workunit will be sent. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Work units with an incomplete result set will have to be sent out to someone else who will, hopefully, be able to run to completion. This is just the same as any other incomplete wok unit.Is there some reason I can't do 3 trickles, then you get given it when I crash it, and start from where I left off? |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
As this EAS batch has been closed, if a task fails in the workunit, no more tasks from that workunit will be sent.Should we abort tasks in progress? Are they any use? |
©2024 cpdn.org