Message boards : Number crunching : OpenIFS Discussion
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
(If this is the wrong place for this question, where should I put it?) Currently, I am using this for my app_config file for Climate Prediction models. [/var/lib/boinc/projects/climateprediction.net]$ cat app_config.xml <app_config> <project_max_concurrent>5</project_max_concurrent> </app_config> If I wish to change this to two concurrent models to run at a time, it is obvious how to do it. But how would I restrict my machine to do only two OpenIFS models, but still allow five traditional models to run if there are no OpenIFS models? Or to allow two OpenIFs models and three traditional models, or something like that. I have 65 GBytes RAM and a fast Internet connection. Memory 62.28 GB Cache 16896 KB Total disk space 488.04 GB Free Disk Space 482.25 GB Measured floating point speed 6.13 billion ops/sec Measured integer speed 26.09 billion ops/sec Average upload rate 149.38 KB/sec Average download rate 11956.26 KB/sec |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977 |
Refer to the user manual for project-level configuration, where the full list of options for app_config.xml are defined. You would to add a separate <app>...</app> section for each IFS variant, once we know the exact application names in use. You could then use <max_concurrent> to limit each IFS type, but I don't see a way to limit the total IFS number of all types, once multiple versions are in play at the same time. |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
You would to add a separate <app>...</app> section for each IFS variant, once we know the exact application names in use. You could then use <max_concurrent> to limit each IFS type, but I don't see a way to limit the total IFS number of all types, once multiple versions are in play at the same time.Depends how likely it is that batches of multiple variants will be out there at once I guess. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
You would to add a separate <app>...</app> section for each IFS variant, once we know the exact application names in use. You could then use <max_concurrent> to limit each IFS type, but I don't see a way to limit the total IFS number of all types, once multiple versions are in play at the same time. I have an upper limit of 12 (in the winter, and 8 in the summer) of total BOINC jobs. (No air conditioning.) I am currently allowing an upper limit of 5 CPDN jobs. So now I have a prototype app_config.xml file that allows a max of 2 for each of the OpenIFS types and a max of 3 for each of the "traditional" ones. If there are only one variant of each distributed at a time, I should be OK. And the max limit of BOINC jobs will prevent disaster. I hope. It looks, in part, like this: $ cat app_config.xml <app_config> <app> <name>OpenIFSname1</name> <max_concurrent>2</max_concurrent> </app> <app> <name>OpenIFSname2</name> <max_concurrent>2</max_concurrent> </app> <app> <name>hadam3_8.09</name> <---<<< <max_concurrent>3</max_concurrent> </app> I guessed at the names of the traditional tasks. One is marked with <---<<< . Is that the correct way the traditional ones are named? This prototype is not in effect yet |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
Last test batch were oifs_43r3_ps so perturbed surface ones. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
You would to add a separate <app>...</app> section for each IFS variant, once we know the exact application names in use. You could then use <max_concurrent> to limit each IFS type, but I don't see a way to limit the total IFS number of all types, once multiple versions are in play at the same time. Add a project max concurrent to that and job's a good'un |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Add a project max concurrent to that and job's a good'un Do you mean like this? $ cat app_config.xml <app_config> <project_max_concurrent>5</project_max_concurrent> <app> <name>OpenIFSname1</name> <max_concurrent>2</max_concurrent> </app> <app> <name>OpenIFSname2</name> <max_concurrent>2</max_concurrent> </app> <app> <name>hadam3_8.09</name> <max_concurrent>3</max_concurrent> </app> <app> <name>hadam3_8.52</name> <max_concurrent>3</max_concurrent> </app> <app> <name>hadcm3s_8.36</name> <max_concurrent>3</max_concurrent> </app> <app> <name>hadsm3_8.02</name> <max_concurrent>3</max_concurrent> </app> </app_config> If so, why have the the itemized list of the traditional work units in there at all? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
@Dave - could we move this discussion to a new thread 'OpenIFS Discussions' please? Would be more appropriate there, and keep this thread for FAQ only. Otherwise the FAQ will get lost in the long chain of messages. I've already sent a list of OpenIFS app names to the forums. Please see this message: https://www.cpdn.org/forum_thread.php?id=9149&postid=66352 I will add these to the FAQ. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977 |
For the discussion thread: For those adding app_config for the first time, and wanting to cover the 'traditional' app names, I have hadam4 hadam4h hadcm3s hadsm4 In other words, the version numbers have not traditionally been included in the app_names. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
Add a project max concurrent to that and job's a good'un Yes, the project max controls the overall count and the itemised list controls the individual apps and you have as much control as you want. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,816,935 RAC: 19,934 |
If so, why have the the itemized list of the traditional work units in there at all? With that app_config you'll run at most 5 CPDN tasks total. They'll be on first come first serve basis. So it's possible, if work is available for multiple apps, that you won't get any work for some of them (may be able to download but not run) because the total tasks for CPDN will already be at 5. There's no way to control things conditionally (if...then) in app_config. You don't need to itemize, the only time it'll make a difference is if work is available for multiple apps at the same time. Check Richard's post for correct Hadley model names. |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
I expect the answer is negative but would the app_config file accept wildcards? If it did, that would be a way to limit the total number of OpenIFS tasks. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977 |
I expect the answer is negative but would the app_config file accept wildcards? If it did, that would be a way to limit the total number of OpenIFS tasks.I think it's highly unlikely, but I can check the code. That's not the way David Anderson's mind works. Edit - no, no sign of a wildcard handler in the code. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,046,692 RAC: 9,354 |
Hello, I have two task OpenIFS 43r3 Baroclinic Lifecycle v1.03 x86_64-pc-linux-gnu on my server : https://www.cpdn.org/results.php?hostid=1533394 The 2 tasks seem finished but they do not start on the server : ======== File transfers ======== 1) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_0.zip direction: upload sticky: no xfer active: no time_so_far: 18.170831 bytes_xferred: 113.000000 xfer_speed: 0.000000 2) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_1.zip direction: upload sticky: no xfer active: no time_so_far: 15.158657 bytes_xferred: 113.000000 xfer_speed: 0.000000 3) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_2.zip direction: upload sticky: no xfer active: no time_so_far: 6.125612 bytes_xferred: 113.000000 xfer_speed: 0.000000 4) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_3.zip direction: upload sticky: no xfer active: no time_so_far: 6.125960 bytes_xferred: 113.000000 xfer_speed: 0.000000 5) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_4.zip direction: upload sticky: no xfer active: no time_so_far: 3.013341 bytes_xferred: 113.000000 xfer_speed: 0.000000 6) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_5.zip direction: upload sticky: no xfer active: no time_so_far: 4.072802 bytes_xferred: 113.000000 xfer_speed: 0.000000 7) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_6.zip direction: upload sticky: no xfer active: no time_so_far: 4.069479 bytes_xferred: 113.000000 xfer_speed: 0.000000 8) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_7.zip direction: upload sticky: no xfer active: no time_so_far: 4.022983 bytes_xferred: 113.000000 xfer_speed: 0.000000 9) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_8.zip direction: upload sticky: no xfer active: no time_so_far: 4.023066 bytes_xferred: 113.000000 xfer_speed: 0.000000 10) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_9.zip direction: upload sticky: no xfer active: no time_so_far: 4.058683 bytes_xferred: 113.000000 xfer_speed: 0.000000 11) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_10.zip direction: upload sticky: no xfer active: no time_so_far: 4.058159 bytes_xferred: 113.000000 xfer_speed: 0.000000 12) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_11.zip direction: upload sticky: no xfer active: no time_so_far: 4.024752 bytes_xferred: 113.000000 xfer_speed: 0.000000 13) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_12.zip direction: upload sticky: no xfer active: no time_so_far: 4.024706 bytes_xferred: 113.000000 xfer_speed: 0.000000 14) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_13.zip direction: upload sticky: no xfer active: no time_so_far: 4.078303 bytes_xferred: 113.000000 xfer_speed: 0.000000 15) ----------- name: oifs_43r3_bl_k028_2016092300_15_943_12163082_0_r29951873_14.zip direction: upload sticky: no xfer active: no time_so_far: 5.123589 bytes_xferred: 113.000000 xfer_speed: 0.000000 16) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_0.zip direction: upload sticky: no xfer active: no time_so_far: 10.116536 bytes_xferred: 113.000000 xfer_speed: 0.000000 17) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_1.zip direction: upload sticky: no xfer active: no time_so_far: 6.096991 bytes_xferred: 113.000000 xfer_speed: 0.000000 18) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_2.zip direction: upload sticky: no xfer active: no time_so_far: 6.122655 bytes_xferred: 113.000000 xfer_speed: 0.000000 19) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_3.zip direction: upload sticky: no xfer active: no time_so_far: 6.123920 bytes_xferred: 113.000000 xfer_speed: 0.000000 20) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_4.zip direction: upload sticky: no xfer active: no time_so_far: 37.340122 bytes_xferred: 113.000000 xfer_speed: 0.000000 21) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_5.zip direction: upload sticky: no xfer active: no time_so_far: 7.092978 bytes_xferred: 113.000000 xfer_speed: 0.000000 22) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_6.zip direction: upload sticky: no xfer active: no time_so_far: 8.045169 bytes_xferred: 113.000000 xfer_speed: 0.000000 23) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_7.zip direction: upload sticky: no xfer active: no time_so_far: 6.124228 bytes_xferred: 113.000000 xfer_speed: 0.000000 24) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_8.zip direction: upload sticky: no xfer active: no time_so_far: 6.086962 bytes_xferred: 113.000000 xfer_speed: 0.000000 25) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_9.zip direction: upload sticky: no xfer active: no time_so_far: 4.025336 bytes_xferred: 113.000000 xfer_speed: 0.000000 26) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_10.zip direction: upload sticky: no xfer active: no time_so_far: 6.041083 bytes_xferred: 113.000000 xfer_speed: 0.000000 27) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_11.zip direction: upload sticky: no xfer active: no time_so_far: 4.023235 bytes_xferred: 113.000000 xfer_speed: 0.000000 28) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_12.zip direction: upload sticky: no xfer active: no time_so_far: 4.058181 bytes_xferred: 113.000000 xfer_speed: 0.000000 29) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_13.zip direction: upload sticky: no xfer active: no time_so_far: 4.061807 bytes_xferred: 113.000000 xfer_speed: 0.000000 30) ----------- name: oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_14.zip direction: upload sticky: no xfer active: no time_so_far: 4.024040 bytes_xferred: 113.000000 xfer_speed: 0.000000 I tried to send it back but without success. Error logs : 817: 26-Nov-2022 13:16:15 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_13.zip: transient upload error 818: 26-Nov-2022 13:16:15 (low) [climateprediction.net] Backing off 00:04:34 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_13.zip 819: 26-Nov-2022 13:16:15 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_14.zip: transient upload error 820: 26-Nov-2022 13:16:15 (low) [climateprediction.net] Backing off 00:07:17 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_14.zip 821: 26-Nov-2022 13:16:16 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_6.zip 822: 26-Nov-2022 13:16:18 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 823: 26-Nov-2022 13:16:18 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_6.zip: transient upload error 824: 26-Nov-2022 13:16:18 (low) [climateprediction.net] Backing off 00:13:19 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_6.zip 825: 26-Nov-2022 13:16:20 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_5.zip 826: 26-Nov-2022 13:16:22 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 827: 26-Nov-2022 13:16:22 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_5.zip: transient upload error 828: 26-Nov-2022 13:16:22 (low) [climateprediction.net] Backing off 00:13:51 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_5.zip 829: 26-Nov-2022 13:16:26 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_4.zip 830: 26-Nov-2022 13:16:28 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 831: 26-Nov-2022 13:16:28 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_4.zip: transient upload error 832: 26-Nov-2022 13:16:28 (low) [climateprediction.net] Backing off 00:08:58 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_4.zip 833: 26-Nov-2022 13:16:30 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_3.zip 834: 26-Nov-2022 13:16:32 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 835: 26-Nov-2022 13:16:32 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_3.zip: transient upload error 836: 26-Nov-2022 13:16:32 (low) [climateprediction.net] Backing off 00:14:24 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_3.zip 837: 26-Nov-2022 13:16:35 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_2.zip 838: 26-Nov-2022 13:16:37 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 839: 26-Nov-2022 13:16:37 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_2.zip: transient upload error 840: 26-Nov-2022 13:16:37 (low) [climateprediction.net] Backing off 00:13:46 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_2.zip 841: 26-Nov-2022 13:16:40 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_1.zip 842: 26-Nov-2022 13:16:42 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 843: 26-Nov-2022 13:16:42 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_1.zip: transient upload error 844: 26-Nov-2022 13:16:42 (low) [climateprediction.net] Backing off 00:10:22 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_1.zip 845: 26-Nov-2022 13:16:44 (low) [climateprediction.net] Started upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_0.zip 846: 26-Nov-2022 13:16:46 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir 847: 26-Nov-2022 13:16:46 (low) [climateprediction.net] Temporarily failed upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_0.zip: transient upload error 848: 26-Nov-2022 13:16:46 (low) [climateprediction.net] Backing off 00:47:14 on upload of oifs_43r3_bl_k029_2016092300_15_943_12163083_0_r1290750425_0.zip Kali. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977 |
822: 26-Nov-2022 13:16:18 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dirThat's critical, but should be easy to fix. |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
Thanks for checking Richard. I thought it highly unlikely but also that it would be silly not to check.I expect the answer is negative but would the app_config file accept wildcards? If it did, that would be a way to limit the total number of OpenIFS tasks.I think it's highly unlikely, but I can check the code. That's not the way David Anderson's mind works. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,046,692 RAC: 9,354 |
822: 26-Nov-2022 13:16:18 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dirThat's critical, but should be easy to fix. Hi, I hope this will be corrected before December 2 (deadline for sending). Kali. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I've let Andy@CPDN know (he may know already). These are new apps, which is why they are being tested with small batches on the production site before the much bigger batches go out.822: 26-Nov-2022 13:16:18 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dirThat's critical, but should be easy to fix. The setup on the 'dev-test' site has to be moved over to the main production site. There are small differences between the two so sometimes things can get missed, esp for new apps. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977 |
Understood. Unfortunately the batches are so small, and are snapped up so quickly, that I haven't seen a single one yet, on either site. When I do, I plan to make a full copy of the task specifications (app_version, workunit, result), so I can inspect details like the upload urls and give further guidance if needed. Mind you, even that wouldn't be enough to diagnose a permissions problem, which I suspect this may be. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Why doesn't this app_config.xml work? My WSL/Ubuntu downloaded 4 oifs_43r3_bl apps and despite the following app_config, decided to start them all at once - any ideas why? My reading of: https://boinc.berkeley.edu/wiki/Client_configuration says this is all I need. The app names match that in the client_state.xml. I don't need a project_max_concurrent as that's optional. So why didn't this work? <app_config> <app> <name>hadsm4</name> <max_concurrent>2</max_concurrent> </app> <app> <name>oifs_43r3</name> <max_concurrent>1</max_concurrent> </app> <app> <name>oifs_43r3_ps</name> <max_concurrent>1</max_concurrent> </app> <app> <name>oifs_43r3_bl</name> <max_concurrent>1</max_concurrent> </app> </app_config> |
©2024 cpdn.org