climateprediction.net home page
OpenIFS Discussion

OpenIFS Discussion

Message boards : Number crunching : OpenIFS Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 32 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66852 - Posted: 10 Dec 2022, 15:26:43 UTC

Last time they were released on consecutive days. but it is likely there will be some overlap with both being on the server at once even if they are released a day apart.
ID: 66852 · Report as offensive     Reply Quote
Vato

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 9,174,915
RAC: 3,722
Message 66853 - Posted: 10 Dec 2022, 15:39:14 UTC - in response to Message 66851.  

WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further?


No, these max_concurrent settings only control the execution of these tasks once downloaded.
The server selects which tasks to assign with no knowledge of your app_config.xml
ID: 66853 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,796,205
RAC: 19,574
Message 66854 - Posted: 10 Dec 2022, 18:13:56 UTC - in response to Message 66851.  

WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further?

No, these max_concurrent settings only control the execution of these tasks once downloaded.
The server selects which tasks to assign with no knowledge of your app_config.xml

Additionally, a value of 0 (zero) in max_concurrent and project_max_ concurrent means the opposite of what one may assume. It means no limit. It doesn't mean that no tasks will run but rather that any and all tasks downloaded will run (limited by resource settings). There's no direct way to prevent all tasks of an app from running using app_config, you have to suspend them via BOINC manager. Probably the simplest thing to do is to use project_max_concurrent and plan on only getting OIFS tasks (regardless of which type) as it's unlikely to get Hadley resends by now.
ID: 66854 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66861 - Posted: 11 Dec 2022, 9:58:55 UTC

On the uploads to upload11, Some changes have been made at Oxford. Andy hasn't been near a computer and won't be till Monday when it will likely get sorted.
ID: 66861 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 66866 - Posted: 11 Dec 2022, 21:35:27 UTC - in response to Message 66854.  

Additionally, a value of 0 (zero) in max_concurrent and project_max_ concurrent means the opposite of what one may assume.
Nothing makes sense in Boinc.
ID: 66866 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,414,270
RAC: 16,212
Message 66867 - Posted: 11 Dec 2022, 23:37:07 UTC - in response to Message 66854.  

What's really needed is to have the list of apps on the project preferences under your CPDN account, which can be then individually selected. As other projects do. I have brought this up with CPDN folk, it's on the Todo list, just not very high priority.
WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further?

No, these max_concurrent settings only control the execution of these tasks once downloaded.
The server selects which tasks to assign with no knowledge of your app_config.xml

Additionally, a value of 0 (zero) in max_concurrent and project_max_ concurrent means the opposite of what one may assume. It means no limit. It doesn't mean that no tasks will run but rather that any and all tasks downloaded will run (limited by resource settings). There's no direct way to prevent all tasks of an app from running using app_config, you have to suspend them via BOINC manager. Probably the simplest thing to do is to use project_max_concurrent and plan on only getting OIFS tasks (regardless of which type) as it's unlikely to get Hadley resends by now.
ID: 66867 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66876 - Posted: 12 Dec 2022, 18:24:06 UTC - in response to Message 66861.  

On the uploads to upload11, Some changes have been made at Oxford. Andy hasn't been near a computer and won't be till Monday when it will likely get sorted.
Now looking like a few days. I have got one retread running which is now up to six zips waiting to go. If I get a second one or more I will suspend them till uploads resume.
ID: 66876 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 66877 - Posted: 12 Dec 2022, 19:09:57 UTC

Oof. Yeah, all my stuff is backed up badly too, I've got 10 or 15 tasks worth of final data ready to go, in addition to the all the trickles. My (limited) upload is going to be jammed once this issue is cleared. :( I may have to route the boxes out Starlink for a while.
ID: 66877 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66878 - Posted: 12 Dec 2022, 21:21:01 UTC - in response to Message 66877.  

Oof. Yeah, all my stuff is backed up badly too,


Me too. I have 918 files to upload.

Luckily for everyone, I am not going to post the list here. ;-)
ID: 66878 · Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 17 Aug 07
Posts: 8
Credit: 37,170,120
RAC: 14,197
Message 66883 - Posted: 13 Dec 2022, 12:08:36 UTC

Upload ist working again! Thanks.
ID: 66883 · Report as offensive     Reply Quote
biodoc

Send message
Joined: 2 Oct 19
Posts: 21
Credit: 47,674,094
RAC: 24,265
Message 66885 - Posted: 13 Dec 2022, 12:23:18 UTC

Yes, all my uploads have finished.
ID: 66885 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66887 - Posted: 13 Dec 2022, 12:29:04 UTC - in response to Message 66883.  

Upload ist working again! Thanks.
I saw a non zero value in users in last 24 hours for OIFS tasks so guessed things were moving but a few minutes ago I got the Communication failed, project servers may be down message. Most likely that is due to the server getting hammered and once the number of us trying to upload the backlog drops things will return to normal.
ID: 66887 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66888 - Posted: 13 Dec 2022, 14:27:48 UTC - in response to Message 66887.  

I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK.
It has the new name:

    PID    PPID USER      PR  NI S    RES  %MEM  %CPU  P     TIME+ COMMAND                                                                   
1840432 1840429 boinc     39  19 R   2.5g   4.1  98.9  9 349:21.19 /var/lib/boinc/slots/10/oifs_43r3_model.exe

OpenIFS 43r3 Perturbed Surface 1.01 x86_64-pc-linux-gnu
Number of tasks completed 	24
Max tasks per day 	28
Number of tasks today 	0
Consecutive valid tasks 	24
Average processing rate 	27.91 GFLOPS
Average turnaround time 	1.33 days

OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu
Number of tasks completed 	0
Max tasks per day 	4
Number of tasks today 	1
Consecutive valid tasks 	0
Average turnaro
 

Task 22250176
Name 	oifs_43r3_ps_0930_2021050100_123_945_12164019_1
Workunit 12164019
Created  13 Dec 2022, 8:22:44 UTC
Sent 	 13 Dec 2022, 8:25:20 UTC


I cannot upload my old "trickles". I see no point in trying to Retry them manually. I may try to retry some of them manually later in the day.

Tue 13 Dec 2022 09:07:14 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip
Tue 13 Dec 2022 09:07:17 AM EST |  | Project communication failed: attempting access to reference site
Tue 13 Dec 2022 09:07:17 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip: connect() failed
Tue 13 Dec 2022 09:07:17 AM EST | climateprediction.net | Backing off 00:03:22 on upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip
Tue 13 Dec 2022 09:07:18 AM EST |  | Internet access OK - project servers may be temporarily down.

ID: 66888 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,414,270
RAC: 16,212
Message 66889 - Posted: 13 Dec 2022, 14:34:10 UTC - in response to Message 66888.  

I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK.
It has the new name:
yep, I think these are re-runs of the hard failures from earlier batches. The new upload server has more capacity so give it a little while to clear the backlog. I'm sure the transfers will go through fine on their own without any manual pushing. Mine did.
ID: 66889 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66890 - Posted: 13 Dec 2022, 15:09:37 UTC - in response to Message 66889.  

WOW! The new servers just came to my notice I am sending two at a time and running around 4500 KBytes per second uploads for each as fast as they will go in my Fiber-optic 75 Megabit per second connection. All the while downloading tasks from WCG.

All my uploads are now complete.
ID: 66890 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66891 - Posted: 13 Dec 2022, 15:14:55 UTC - in response to Message 66889.  

I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK.
It has the new name:

yep, I think these are re-runs of the hard failures from earlier batches. The new upload server has more capacity so give it a little while to clear the backlog. I'm sure the transfers will go through fine on their own without any manual pushing. Mine did.


I agree: the new upload server has way more capacity than I have ever seen. All my backlog is now complete with no help from me.

I am not sure if the task I got overnight is a hard failure from earlier batches because it uses a new model, with the new name (1.05 instead of 1.01). OTOH, the one I got is definately a re-run.
ID: 66891 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66892 - Posted: 13 Dec 2022, 17:57:04 UTC - in response to Message 66890.  

CPDN now recognizes their new connection speeds. 3 to 4 Megabytes per second; i.e.,24 to 32 megabits per second. It was maintaining two uploads like this at a time this morning while uploading my greater than 900 uploads all queued up on my machine.

My machine has a 75 Megabit/second fiber-optic network connection. Here is what I am actually getting.

Timestamp 	    Download   Upload 	   Latency   Jitter 	Quality Score 	Test Server
12/13/2022 12:37:5  79.44 Mbps 89.82 Mbps  5 ms      1 ms       Excellent       speedgauge2.optonline.net.prod.hosts.ooklaserver.net
11/29/2022 16:30:21 78.70 Mbps 89.08 Mbps  6 ms      1 ms       Excellent       nyc.speedtest.clouvider.net.prod.hosts.ooklaserver.net
11/8/2022 15:24:14  80.83 Mbps 89.12 Mbps  6 ms      2 ms       Excellent       ny2.speedtest.gslnetworks.com.prod.hosts.ooklaserver.net

Computer 1511241
Computer information

CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Red Hat Enterprise Linux 8.6 (Ootpa) [4.18.0-372.26.1.el8_6.x86_64|libc 2.28]
BOINC version 	7.20.2
Memory 	62.28 GB
Cache 	16896 KB
Measured floating point speed 	6.13 billion ops/sec
Measured integer speed 	       26.09 billion ops/sec
Average upload rate 	3017.36 KB/sec <---<<<
Average download rate 	4776.08 KB/sec <---<<<
Average turnaround time 	4.43 days

ID: 66892 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66893 - Posted: 14 Dec 2022, 8:57:48 UTC

Something funny has happened, the estimated time for these has gone up to over 4 days on the three resends I have. (actual time is going to be about 12 hours.) And my bored band is keeping up with the three resends I have running that arrived during the night. Wind must be in the right direction. I looked at the success rate which includes those that have succeeded at second or subsequent attempts and the three batches are at 68, 72 and 74% at the moment. May be a fraction higher because I think that stat is just updated once a day at midnight.
ID: 66893 · Report as offensive     Reply Quote
biodoc

Send message
Joined: 2 Oct 19
Posts: 21
Credit: 47,674,094
RAC: 24,265
Message 66894 - Posted: 14 Dec 2022, 9:39:40 UTC - in response to Message 66893.  
Last modified: 14 Dec 2022, 9:40:36 UTC

Something funny has happened, the estimated time for these has gone up to over 4 days on the three resends I have. (actual time is going to be about 12 hours.) And my bored band is keeping up with the three resends I have running that arrived during the night. Wind must be in the right direction. I looked at the success rate which includes those that have succeeded at second or subsequent attempts and the three batches are at 68, 72 and 74% at the moment. May be a fraction higher because I think that stat is just updated once a day at midnight.

A new version (1.05) of the OpenIFS 43r3 Perturbed Surface application was distributed on Dec. 12th. The last 4 resends I received used the new app. version. Three completed successfully and 1 is in progress.
ID: 66894 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,984,965
RAC: 21,892
Message 66895 - Posted: 14 Dec 2022, 10:18:34 UTC - in response to Message 66894.  

And another oddity. two of the resends, from batch 947 went up to 100% and then dropped back to 99.990% as I was watching. When time remaining dropped to 0 they kept showing as running despite negligble cpu usage. top. I have suspended them in case getting any information from the slot files might be scuppered by letting them continue, though it may be I have to kill the processes to stop them showing as running. I will wait till the third task from 945 finishes just to ensure I don't kill the wrong process.
ID: 66895 · Report as offensive     Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 32 · Next

Message boards : Number crunching : OpenIFS Discussion

©2024 cpdn.org