climateprediction.net (CPDN) home page
Thread 'OpenIFS Discussion'

Thread 'OpenIFS Discussion'

Message boards : Number crunching : OpenIFS Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 32 · Next

AuthorMessage
ProfilePDW

Send message
Joined: 29 Nov 17
Posts: 82
Credit: 14,387,344
RAC: 91,190
Message 67026 - Posted: 24 Dec 2022, 11:38:48 UTC - in response to Message 67023.  

Task will get full credit. Glen did post an explanation about tasks that finish successfully but appear to fail a few days ago. I will see if I can find it later. From what I recall, I didn't read it carefully enough to fully understand it.

Glenn said this in post https://www.cpdn.org/forum_thread.php?id=9162&postid=66949#66949 :

Agreed. I've asked CPDN if there is a way of getting the server to check the upload was received OK to reclassify this as a success. It may not be easy as the uploads go to a cloud server first. Not my expertise.
ID: 67026 · Report as offensive     Reply Quote
[AF] Kalianthys

Send message
Joined: 20 Dec 20
Posts: 13
Credit: 40,045,863
RAC: 9,755
Message 67027 - Posted: 24 Dec 2022, 13:55:29 UTC

Thank You Dave et PDW.

Kali.
ID: 67027 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,075,151
RAC: 374
Message 67028 - Posted: 24 Dec 2022, 18:03:58 UTC

It seems like there is enough work for the rest of the year available ;)

Merry Christmas
Felix
ID: 67028 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,960,988
RAC: 14,084
Message 67029 - Posted: 24 Dec 2022, 18:07:47 UTC

Getting transient HTTP message:

Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_61.zip: transient HTTP error
Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Backing off 00:02:50 on upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_61.zip
Sat 24 Dec 2022 17:54:15 GMT | climateprediction.net | Started upload of oifs_43r3_ps_0873_1981050100_123_950_12167517_0_r1054526626_62.zip
Sat 24 Dec 2022 17:54:16 GMT | | Internet access OK - project servers may be temporarily down.


Guess that's it for the next few days because of the Hols.

Network activity suspended and tasks reduced to 1 until I hear otherwise.

Happy Xmas everyone.
ID: 67029 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 67030 - Posted: 24 Dec 2022, 18:09:54 UTC - in response to Message 67028.  
Last modified: 24 Dec 2022, 18:10:42 UTC

If the network holds up ...

All my uploads are timing out, since about 17:23 - tracert gets no further than

 11    18 ms    18 ms    18 ms  ral-r26.ja.net [146.97.41.34]
 12     *        *        *     Request timed out.
ID: 67030 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67031 - Posted: 24 Dec 2022, 18:18:30 UTC - in response to Message 67030.  
Last modified: 24 Dec 2022, 18:32:55 UTC

All my uploads are timing out,


Mine too. But my traceroute seems to work OK.
Problem seems to have started:
Sat 24 Dec 2022 12:24:01 PM EST | climateprediction.net | Started upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_67.zip
Sat 24 Dec 2022 12:24:10 PM EST | climateprediction.net | Computation for task oifs_43r3_ps_0447_1995050100_123_964_12181091_0 finished
Sat 24 Dec 2022 12:24:10 PM EST | climateprediction.net | Starting task oifs_43r3_ps_0257_2002050100_123_971_12187901_0
Sat 24 Dec 2022 12:24:21 PM EST | climateprediction.net | Started upload of oifs_43r3_ps_0160_1996050100_123_965_12181804_0_r1040728371_122.zip
Sat 24 Dec 2022 12:26:03 PM EST |  | Project communication failed: attempting access to reference site
Sat 24 Dec 2022 12:26:03 PM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_67.zip: transient HTTP error
Sat 24 Dec 2022 12:26:03 PM EST | climateprediction.net | Backing off 00:02:10 on upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_67.zip
Sat 24 Dec 2022 12:26:03 PM EST | climateprediction.net | Started upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_107.zip
Sat 24 Dec 2022 12:26:05 PM EST |  | Internet access OK - project servers may be temporarily down.
Sat 24 Dec 2022 12:26:21 PM EST | climateprediction.net | Computation for task oifs_43r3_ps_0160_1996050100_123_965_12181804_0 finished
Sat 24 Dec 2022 12:26:21 PM EST | climateprediction.net | Starting task oifs_43r3_ps_0675_2002050100_123_971_12188319_0
Sat 24 Dec 2022 12:26:23 PM EST |  | Project communication failed: attempting access to reference site
Sat 24 Dec 2022 12:26:23 PM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0160_1996050100_123_965_12181804_0_r1040728371_122.zip: transient HTTP error
Sat 24 Dec 2022 12:26:23 PM EST | climateprediction.net | Backing off 00:02:08 on upload of oifs_43r3_ps_0160_1996050100_123_965_12181804_0_r1040728371_122.zip
Sat 24 Dec 2022 12:26:24 PM EST |  | Internet access OK - project servers may be temporarily down.


$ traceroute 146.97.41.34
traceroute to 146.97.41.34 (146.97.41.34), 30 hops max, 60 byte packets
 1  Fios_Quantum_Gateway.fios-router.home (192.168.0.1)  0.341 ms  0.441 ms  1.725 ms
 2  lo0-100.NWRKNJ-VFTTP-309.verizon-gni.net (71.127.205.1)  4.126 ms  6.555 ms  8.083 ms
 3  at-0-0-0-1717.ALT2-CORE-RTR2.verizon-gni.net (100.41.5.70)  10.021 ms  9.159 ms  10.106 ms
 4  0.csi1.NBWKNJNB-MSE01-BB-SU1.ALTER.NET (140.222.4.106)  11.529 ms 0.csi1.NWRKNJ02-MSE01-BB-SU1.ALTER.NET (140.222.4.104)  11.727 ms 0.csi1.NBWKNJNB-MSE01-BB-SU1.ALTER.NET (140.222.4.106)  11.615 ms
 5  * * *
 6  * * *
 7  nyk-b2-link.ip.twelve99.net (80.239.192.36)  6.703 ms  6.880 ms  6.784 ms
 8  nyk-bb2-link.ip.twelve99.net (62.115.135.162)  9.091 ms  6.156 ms  6.017 ms
 9  ldn-bb4-link.ip.twelve99.net (62.115.112.245)  75.917 ms ldn-bb1-link.ip.twelve99.net (62.115.113.21)  78.821 ms ldn-bb4-link.ip.twelve99.net (62.115.112.245)  78.562 ms
10  ldn-b2-link.ip.twelve99.net (62.115.122.189)  83.447 ms ldn-b2-link.ip.twelve99.net (62.115.120.239)  78.660 ms ldn-b2-link.ip.twelve99.net (62.115.122.189)  83.480 ms
11  jisc-ic345131-ldn-b2.ip.twelve99-cust.net (62.115.175.131)  80.799 ms  81.055 ms  77.332 ms
12  ae24.londhx-sbr1.ja.net (146.97.35.197)  75.097 ms  73.519 ms  78.504 ms
13  ae29.londpg-sbr2.ja.net (146.97.33.2)  76.043 ms  78.558 ms  76.892 ms
14  ral-r26.ja.net (146.97.41.34)  79.165 ms  79.091 ms  76.619 ms

ID: 67031 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 67032 - Posted: 24 Dec 2022, 18:25:18 UTC - in response to Message 67031.  

But somehow, we need to bridge the gap between .ja.net (I think that's the UK's "Joint Academic Network") and upload11.cpdn.org
ID: 67032 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67033 - Posted: 24 Dec 2022, 18:43:41 UTC - in response to Message 67032.  

There is a big gap there...
There is a big delay in crossing the ocean from here in North America to Europe, although I very much doubt the delay has anything to do with the present problem.

Is that by cable or by satellite? Thank gawd they are no longer putting rolls of magnetic tape on airplanes as they did a few decades ago. Good enough for e-mail, I guess.
$ traceroute upload11.cpdn.org
traceroute to upload11.cpdn.org (192.171.169.187), 30 hops max, 60 byte packets
 1  Fios_Quantum_Gateway.fios-router.home (192.168.0.1)  0.364 ms  0.509 ms  0.641 ms
 2  lo0-100.NWRKNJ-VFTTP-309.verizon-gni.net (71.127.205.1)  7.899 ms  5.678 ms  7.925 ms
 3  at-0-0-0-1717.ALT2-CORE-RTR2.verizon-gni.net (100.41.5.70)  8.002 ms at-0-0-0-1716.ALT2-CORE-RTR1.verizon-gni.net (100.41.5.68)  10.877 ms  8.141 ms
 4  0.csi1.NWRKNJ02-MSE01-BB-SU1.ALTER.NET (140.222.4.104)  11.025 ms 0.csi1.NBWKNJNB-MSE01-BB-SU1.ALTER.NET (140.222.4.106)  10.587 ms 0.csi1.NWRKNJ02-MSE01-BB-SU1.ALTER.NET (140.222.4.104)  10.918 ms
 5  * * *
 6  * * *
 7  nyk-b2-link.ip.twelve99.net (80.239.192.36)  6.520 ms  6.612 ms  6.688 ms
 8  * nyk-bb2-link.ip.twelve99.net (62.115.135.162)  8.858 ms nyk-bb1-link.ip.twelve99.net (62.115.135.160)  11.298 ms
 9  * ldn-bb1-link.ip.twelve99.net (62.115.113.21)  77.185 ms  79.592 ms
10  ldn-b2-link.ip.twelve99.net (62.115.120.239)  76.907 ms ldn-b2-link.ip.twelve99.net (62.115.122.189)  82.196 ms ldn-b2-link.ip.twelve99.net (62.115.120.239)  76.953 ms
11  jisc-ic345131-ldn-b2.ip.twelve99-cust.net (62.115.175.131)  82.019 ms  79.404 ms  81.916 ms
12  ae24.londhx-sbr1.ja.net (146.97.35.197)  74.691 ms  74.754 ms  77.150 ms
13  ae29.londpg-sbr2.ja.net (146.97.33.2)  77.274 ms  75.398 ms  74.099 ms
14  ae31.erdiss-sbr2.ja.net (146.97.33.22)  81.635 ms  79.846 ms  82.346 ms
15  * * *
16  ral-r26.ja.net (146.97.41.34)  82.361 ms  79.864 ms  79.845 ms
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

ID: 67033 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67035 - Posted: 24 Dec 2022, 21:52:43 UTC

Email sent but I do not see why Andy should sort this on Christmas day! it may well be the new year before it gets sorted.
ID: 67035 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67039 - Posted: 25 Dec 2022, 7:34:34 UTC

From Andy

Hi Dave,

Hmmm, thanks, I can't get into this machine. I can reboot it, but the SSH port is still inaccessible. My theory at the moment it's an issue at the JASMIN cloud level (where this machine resides), however I am not sure. If this persists I will contact their support.

Best wishes,

Andy
ID: 67039 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67053 - Posted: 26 Dec 2022, 12:40:45 UTC - in response to Message 67039.  

It does persist.
ID: 67053 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67054 - Posted: 26 Dec 2022, 16:28:28 UTC - in response to Message 67053.  
Last modified: 26 Dec 2022, 16:32:53 UTC

It does persist.
Message in event log has changed from, "internet access ok...." to transient http error" This suggests something may have changed and it is now the server getting hammered that is causing a problem.

Edit: Maybe I spoke too soon. Project servers message appears eventually though users in past 24 hours has gone up from 0 to 1 so maybe someone has gotten a task through. will try again in an hour or so.
ID: 67054 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67055 - Posted: 26 Dec 2022, 17:11:23 UTC - in response to Message 67054.  

Message in event log has changed from, "internet access ok...." to transient http error" This suggests something may have changed and it is now the server getting hammered that is causing a problem.

Edit: Maybe I spoke too soon. Project servers message appears eventually though users in past 24 hours has gone up from 0 to 1 so maybe someone has gotten a task through. will try again in an hour or so.


I get both. N.B.: I am in EST time zone,

Mon 26 Dec 2022 11:12:48 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0514_2006050100_123_975_12192158_0_r1364261494_0.zip
Mon 26 Dec 2022 11:12:48 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_69.zip
Mon 26 Dec 2022 11:14:49 AM EST |  | Project communication failed: attempting access to reference site
Mon 26 Dec 2022 11:14:49 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0514_2006050100_123_975_12192158_0_r1364261494_0.zip: transient HTTP error
Mon 26 Dec 2022 11:14:49 AM EST | climateprediction.net | Backing off 01:43:50 on upload of oifs_43r3_ps_0514_2006050100_123_975_12192158_0_r1364261494_0.zip
Mon 26 Dec 2022 11:14:49 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_69.zip: transient HTTP error
Mon 26 Dec 2022 11:14:49 AM EST | climateprediction.net | Backing off 00:19:18 on upload of oifs_43r3_ps_0438_1998050100_123_967_12184082_1_r1277369987_69.zip
Mon 26 Dec 2022 11:14:51 AM EST |  | Internet access OK - project servers may be temporarily down.

ID: 67055 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67056 - Posted: 26 Dec 2022, 23:07:55 UTC - in response to Message 67055.  

Will confirm to Andy nothing moving in the morning.
ID: 67056 · Report as offensive     Reply Quote
biodoc

Send message
Joined: 2 Oct 19
Posts: 21
Credit: 47,674,094
RAC: 24,265
Message 67057 - Posted: 26 Dec 2022, 23:42:37 UTC - in response to Message 67056.  

Will confirm to Andy nothing moving in the morning.


That would be great. I have a total backlog of 13,315 14.5 Mb files to upload from 4 computers. That's around 193 GB.
ID: 67057 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,802,854
RAC: 19,763
Message 67058 - Posted: 27 Dec 2022, 3:18:28 UTC - in response to Message 67057.  

That would be great. I have a total backlog of 13,315 14.5 Mb files to upload from 4 computers. That's around 193 GB.

I have 100GB in total and I thought I have a lot. :-)
ID: 67058 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67059 - Posted: 27 Dec 2022, 7:56:55 UTC
Last modified: 27 Dec 2022, 12:10:32 UTC

Strange, number of users in past 24 hours for this type of task has gone up from 1 to 2. This implies, someone has finished a task and got it to report though I see no sign of uploads shifting. I wonder if one of the older smaller batches went to a different server?

Email sent.

Edit: I guess it could be computers that have finished uploading and been turned off before the backoff of an hour finished for them to report the tasks?
ID: 67059 · Report as offensive     Reply Quote
OliverF

Send message
Joined: 23 Nov 19
Posts: 4
Credit: 6,597,088
RAC: 79,816
Message 67073 - Posted: 28 Dec 2022, 10:02:13 UTC - in response to Message 67059.  

Strange, number of users in past 24 hours for this type of task has gone up from 1 to 2. This implies, someone has finished a task and got it to report though I see no sign of uploads shifting. I wonder if one of the older smaller batches went to a different server?

Email sent.

Edit: I guess it could be computers that have finished uploading and been turned off before the backoff of an hour finished for them to report the tasks?


Is the active users counting results or active jobs? Because I was sooo happy the CPDN had an abundance of jobs and joined the party - only to then find out I can't get rid of my results.
Two machines crunching, two harddrives slowly filling up.

/Oliver
ID: 67073 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 67074 - Posted: 28 Dec 2022, 10:05:42 UTC

Is the active users counting results or active jobs?
Users in past 24 hours means the number of users who have completed and reported tasks. Currently only on on server status page from WAH2 windows work which goes to a different server.
ID: 67074 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,802,854
RAC: 19,763
Message 67087 - Posted: 28 Dec 2022, 12:15:56 UTC - in response to Message 67073.  

... I was sooo happy the CPDN had an abundance of jobs and joined the party - only to then find out I can't get rid of my results.
Two machines crunching, two harddrives slowly filling up.

Thanks, that's funny. :-) Initially it's "Where's the work?!", now it's "How do I get rid of the results?!"
ID: 67087 · Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 32 · Next

Message boards : Number crunching : OpenIFS Discussion

©2024 cpdn.org