Message boards : Number crunching : OpenIFS Discussion
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 32 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
Last time they were released on consecutive days. but it is likely there will be some overlap with both being on the server at once even if they are released a day apart. |
Send message Joined: 4 Oct 19 Posts: 15 Credit: 9,174,915 RAC: 3,722 |
WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further? No, these max_concurrent settings only control the execution of these tasks once downloaded. The server selects which tasks to assign with no knowledge of your app_config.xml |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,803,682 RAC: 19,762 |
WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further? Additionally, a value of 0 (zero) in max_concurrent and project_max_ concurrent means the opposite of what one may assume. It means no limit. It doesn't mean that no tasks will run but rather that any and all tasks downloaded will run (limited by resource settings). There's no direct way to prevent all tasks of an app from running using app_config, you have to suspend them via BOINC manager. Probably the simplest thing to do is to use project_max_concurrent and plan on only getting OIFS tasks (regardless of which type) as it's unlikely to get Hadley resends by now. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
On the uploads to upload11, Some changes have been made at Oxford. Andy hasn't been near a computer and won't be till Monday when it will likely get sorted. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Additionally, a value of 0 (zero) in max_concurrent and project_max_ concurrent means the opposite of what one may assume.Nothing makes sense in Boinc. |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,431,665 RAC: 17,512 |
What's really needed is to have the list of apps on the project preferences under your CPDN account, which can be then individually selected. As other projects do. I have brought this up with CPDN folk, it's on the Todo list, just not very high priority. WUs from this app name won’t be downloaded to the computer, means limit the climateprediction.net WUs on a particular computer further? |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
On the uploads to upload11, Some changes have been made at Oxford. Andy hasn't been near a computer and won't be till Monday when it will likely get sorted.Now looking like a few days. I have got one retread running which is now up to six zips waiting to go. If I get a second one or more I will suspend them till uploads resume. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Oof. Yeah, all my stuff is backed up badly too, I've got 10 or 15 tasks worth of final data ready to go, in addition to the all the trickles. My (limited) upload is going to be jammed once this issue is cleared. :( I may have to route the boxes out Starlink for a while. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Oof. Yeah, all my stuff is backed up badly too, Me too. I have 918 files to upload. Luckily for everyone, I am not going to post the list here. ;-) |
Send message Joined: 17 Aug 07 Posts: 8 Credit: 37,173,433 RAC: 14,254 |
Upload ist working again! Thanks. |
Send message Joined: 2 Oct 19 Posts: 21 Credit: 47,674,094 RAC: 24,265 |
Yes, all my uploads have finished. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
Upload ist working again! Thanks.I saw a non zero value in users in last 24 hours for OIFS tasks so guessed things were moving but a few minutes ago I got the Communication failed, project servers may be down message. Most likely that is due to the server getting hammered and once the number of us trying to upload the backlog drops things will return to normal. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK. It has the new name: PID PPID USER PR NI S RES %MEM %CPU P TIME+ COMMAND 1840432 1840429 boinc 39 19 R 2.5g 4.1 98.9 9 349:21.19 /var/lib/boinc/slots/10/oifs_43r3_model.exe OpenIFS 43r3 Perturbed Surface 1.01 x86_64-pc-linux-gnu Number of tasks completed 24 Max tasks per day 28 Number of tasks today 0 Consecutive valid tasks 24 Average processing rate 27.91 GFLOPS Average turnaround time 1.33 days OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu Number of tasks completed 0 Max tasks per day 4 Number of tasks today 1 Consecutive valid tasks 0 Average turnaro Task 22250176 Name oifs_43r3_ps_0930_2021050100_123_945_12164019_1 Workunit 12164019 Created 13 Dec 2022, 8:22:44 UTC Sent 13 Dec 2022, 8:25:20 UTC I cannot upload my old "trickles". I see no point in trying to Retry them manually. I may try to retry some of them manually later in the day. Tue 13 Dec 2022 09:07:14 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip Tue 13 Dec 2022 09:07:17 AM EST | | Project communication failed: attempting access to reference site Tue 13 Dec 2022 09:07:17 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip: connect() failed Tue 13 Dec 2022 09:07:17 AM EST | climateprediction.net | Backing off 00:03:22 on upload of oifs_43r3_ps_0930_2021050100_123_945_12164019_1_r137271713_43.zip Tue 13 Dec 2022 09:07:18 AM EST | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,431,665 RAC: 17,512 |
I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK.yep, I think these are re-runs of the hard failures from earlier batches. The new upload server has more capacity so give it a little while to clear the backlog. I'm sure the transfers will go through fine on their own without any manual pushing. Mine did. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
WOW! The new servers just came to my notice I am sending two at a time and running around 4500 KBytes per second uploads for each as fast as they will go in my Fiber-optic 75 Megabit per second connection. All the while downloading tasks from WCG. All my uploads are now complete. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I got a "new" task over night (my time) that now has almost six hours on it. It seems to be running OK. I agree: the new upload server has way more capacity than I have ever seen. All my backlog is now complete with no help from me. I am not sure if the task I got overnight is a hard failure from earlier batches because it uses a new model, with the new name (1.05 instead of 1.01). OTOH, the one I got is definately a re-run. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
CPDN now recognizes their new connection speeds. 3 to 4 Megabytes per second; i.e.,24 to 32 megabits per second. It was maintaining two uploads like this at a time this morning while uploading my greater than 900 uploads all queued up on my machine. My machine has a 75 Megabit/second fiber-optic network connection. Here is what I am actually getting. Timestamp Download Upload Latency Jitter Quality Score Test Server 12/13/2022 12:37:5 79.44 Mbps 89.82 Mbps 5 ms 1 ms Excellent speedgauge2.optonline.net.prod.hosts.ooklaserver.net 11/29/2022 16:30:21 78.70 Mbps 89.08 Mbps 6 ms 1 ms Excellent nyc.speedtest.clouvider.net.prod.hosts.ooklaserver.net 11/8/2022 15:24:14 80.83 Mbps 89.12 Mbps 6 ms 2 ms Excellent ny2.speedtest.gslnetworks.com.prod.hosts.ooklaserver.net Computer 1511241 Computer information CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Red Hat Enterprise Linux 8.6 (Ootpa) [4.18.0-372.26.1.el8_6.x86_64|libc 2.28] BOINC version 7.20.2 Memory 62.28 GB Cache 16896 KB Measured floating point speed 6.13 billion ops/sec Measured integer speed 26.09 billion ops/sec Average upload rate 3017.36 KB/sec <---<<< Average download rate 4776.08 KB/sec <---<<< Average turnaround time 4.43 days |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
Something funny has happened, the estimated time for these has gone up to over 4 days on the three resends I have. (actual time is going to be about 12 hours.) And my bored band is keeping up with the three resends I have running that arrived during the night. Wind must be in the right direction. I looked at the success rate which includes those that have succeeded at second or subsequent attempts and the three batches are at 68, 72 and 74% at the moment. May be a fraction higher because I think that stat is just updated once a day at midnight. |
Send message Joined: 2 Oct 19 Posts: 21 Credit: 47,674,094 RAC: 24,265 |
Something funny has happened, the estimated time for these has gone up to over 4 days on the three resends I have. (actual time is going to be about 12 hours.) And my bored band is keeping up with the three resends I have running that arrived during the night. Wind must be in the right direction. I looked at the success rate which includes those that have succeeded at second or subsequent attempts and the three batches are at 68, 72 and 74% at the moment. May be a fraction higher because I think that stat is just updated once a day at midnight. A new version (1.05) of the OpenIFS 43r3 Perturbed Surface application was distributed on Dec. 12th. The last 4 resends I received used the new app. version. Three completed successfully and 1 is in progress. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
And another oddity. two of the resends, from batch 947 went up to 100% and then dropped back to 99.990% as I was watching. When time remaining dropped to 0 they kept showing as running despite negligble cpu usage. top. I have suspended them in case getting any information from the slot files might be scuppered by letting them continue, though it may be I have to kill the processes to stop them showing as running. I will wait till the third task from 945 finishes just to ensure I don't kill the wrong process. |
©2024 cpdn.org