climateprediction.net (CPDN) home page
Thread 'Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested'

Thread 'Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested'

Message boards : Number crunching : Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1067
Credit: 17,020,946
RAC: 5,160
Message 71696 - Posted: 29 Oct 2024, 10:58:21 UTC

A thanks to all who contributed to the testing. I reported on the feedback from everyone on this thread to the CPDN Technical meeting yesterday. It was well received and our thanks to everyone.
---
CPDN Visiting Scientist
ID: 71696 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1067
Credit: 17,020,946
RAC: 5,160
Message 71697 - Posted: 29 Oct 2024, 11:17:57 UTC

Regarding the GLIBC version issue highlighted in this thread, we've decided to build the next versions of the linux applications (OpenIFS, HadAM4, HadSM4) on Ubuntu 18.04 LTS as it's in long term support for years yet. Ubuntu 18.04 uses GLIBC version 2.27. This only affects new linux applications and not the current ones.
---
CPDN Visiting Scientist
ID: 71697 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1066
Credit: 36,887,369
RAC: 1,533
Message 71698 - Posted: 29 Oct 2024, 13:04:08 UTC - in response to Message 71697.  

Regarding the GLIBC version issue highlighted in this thread, we've decided to build the next versions of the linux applications (OpenIFS, HadAM4, HadSM4) on Ubuntu 18.04 LTS as it's in long term support for years yet. Ubuntu 18.04 uses GLIBC version 2.27. This only affects new linux applications and not the current ones.
The exact error messages from my Linux Mint 20 machine were:

./oifs_43r3_omp_model.exe: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./oifs_43r3_omp_model.exe)
./oifs_43r3_omp_model.exe: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./oifs_43r3_omp_model.exe)
./oifs_43r3_omp_model.exe: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./oifs_43r3_omp_model.exe)
That sounds as if 18.04 LTS should be good enough - but I could run a revised test if you want confirmation.
ID: 71698 · Report as offensive     Reply Quote
ProfilePDW

Send message
Joined: 29 Nov 17
Posts: 83
Credit: 17,184,625
RAC: 13,161
Message 71911 - Posted: 9 Jan 2025, 15:56:37 UTC - in response to Message 71627.  

In reply to Glenn Carver's message of 15 Oct 2024:
I'm curious if you have estimate of how many hosts would be eligible.
Yes, we checked the database. There are ~600 linux hosts with 32+ GB RAM. Enough to make it workable.

You are doubling up to 64+ GB now ?
ID: 71911 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 301
Credit: 3,288,263
RAC: 26,370
Message 71930 - Posted: 23 Jan 2025, 21:24:58 UTC

I'd be willing to try these models, under the following conditions:
* CPDN server will respect BOINC compute limits (preferences) per computer and not give work for machines with insufficent resources.
* Models will respect BOINC compute limits (max memory, CPU count, disk) and will not start if insufficent resource available. (i.e., in the event these limits change since initial download).
* Checkpoint files are compressed (best effort).
* OpenIFS models are 'opt-in' via project preferences
* Sticky forum thread (can be read-only) for OpenIFS system requirements and warnings.

FYI... I only have 32GB RAM and 6 cores (12 threads)... so would machines like mine be able to run them well enough?
ID: 71930 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1067
Credit: 17,020,946
RAC: 5,160
Message 71934 - Posted: 25 Jan 2025, 10:05:47 UTC - in response to Message 71930.  

In reply to DJStarfox's message of 23 Jan 2025:
I'd be willing to try these models, under the following conditions:
* CPDN server will respect BOINC compute limits (preferences) per computer and not give work for machines with insufficent resources.
This happens normally for all tasks. It's part of BOINC.

* Models will respect BOINC compute limits (max memory, CPU count, disk) and will not start if insufficient resource available. (i.e., in the event these limits change since initial download).
We found and reported a bug in the boinc client code in the way it treated a task's requested memory. It's now been fixed by David Anderson but we do need everyone to update their client version when this fixed version is released. In the meantime, we'll only configure 1 task in progress per user.

* Checkpoint files are compressed (best effort).
Checkpoint files don't compress very well. They also take a long time to compress. I am still weighing up the pros & cons of adding compression in.

* OpenIFS models are 'opt-in' via project preferences
They will be.

* Sticky forum thread (can be read-only) for OpenIFS system requirements and warnings.
Good idea. The project preferences page will also contain more info on the tasks.

FYI... I only have 32GB RAM and 6 cores (12 threads)... so would machines like mine be able to run them well enough?
32Gb would be enough if you are not using the machine for anything else that requires a lot of RAM.
---
CPDN Visiting Scientist
ID: 71934 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4559
Credit: 19,039,635
RAC: 18,944
Message 71935 - Posted: 25 Jan 2025, 11:10:03 UTC

but we do need everyone to update their client version when this fixed version is released. In the meantime, we'll only configure 1 task in progress per user.
Presumably it is fixed in 8.1.0 which can be installed following the instruction on the BOINC download page.
ID: 71935 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1066
Credit: 36,887,369
RAC: 1,533
Message 71936 - Posted: 25 Jan 2025, 12:08:08 UTC - in response to Message 71935.  

Presumably it is fixed in 8.1.0 which can be installed following the instruction on the BOINC download page.
v8.1.0 (odd version number) is very much for 'work in progress', and is constantly changing. It can't be downloaded in a 'ready to run' form: it has to be compiled by the user from source code. Some users may be equipped to handle that process, but I don't think it can be recommended for the vast majority of our users.

Instead, there's a version 8.0.4 available on the 'all versions' download page (https://boinc.berkeley.edu/download_all.php), though unfortunately not for Linux: and the instructions for building your own copy have gone AWOL from https://boinc.berkeley.edu/wiki/BuildSystem

I think we probably need to engage with BOINC about getting a usable version of BOINC, and the related documentation, available for the general Linux user. But just at the moment, the key people seem to be tying themselves in knots over an incompatibility between BOINC and VirtualBox on Apple machines.
ID: 71936 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4559
Credit: 19,039,635
RAC: 18,944
Message 71937 - Posted: 25 Jan 2025, 12:40:46 UTC - in response to Message 71936.  
Last modified: 25 Jan 2025, 12:42:49 UTC

It can't be downloaded in a 'ready to run' form: it has to be compiled by the user from source code

on the page with the download instructions there is now an option to choose the 8.1.0 nightly build rather than faff about installing dependencies to compile the code yourself. Instructions for 8.0.4 are also there via the dropdown menu.
ID: 71937 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1066
Credit: 36,887,369
RAC: 1,533
Message 71938 - Posted: 25 Jan 2025, 13:29:02 UTC - in response to Message 71937.  

Yes - I was posting from a Windows machine, and checked it from Linux later.

I sometimes do that when doing housekeeping - work on the Linux machine, while referring to the instructions on a Windows machine and separate screen beside it. And I haven't found a way for making that work in the current state of the BOINC documentation. Linux needs to be listed on the 'download all' page, which is otherwise cross-platform.
ID: 71938 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1067
Credit: 17,020,946
RAC: 5,160
Message 71940 - Posted: 26 Jan 2025, 10:23:39 UTC - in response to Message 71938.  

CPDN can't recommend users download and compile the latest nightly build though, not as a general 'all-users' statement. We'll have to wait.
---
CPDN Visiting Scientist
ID: 71940 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 130
Credit: 44,254,664
RAC: 9,487
Message 71941 - Posted: 26 Jan 2025, 19:03:16 UTC - in response to Message 71940.  
Last modified: 26 Jan 2025, 19:06:17 UTC

IMO, if we are going to ask user to do anything in addition to the opt-in, we should just provide a `app_config.xml` template and teach them how to calculate max concurrent tasks on their host. That's how many of us crunched OpenIFS tasks before and we know it's manageable.

No project (or even BOINC client team) could reasonably be expected to sort out how to help users update all the dozens of common distros with ad hoc packages. Installing a self-compiled application or third-party packages while an older version exists in the distro repository can have subtle implications for dependencies and future upgrades. We should treat user systems as productions systems and it's totally fair for them to only use packages from distro repos. This means that even if 8.1.0 is released today, distros like RHEL or its derivatives are probably not going to see it in 2 years. We can't bank on the version upgrade any time soon.

We don't have to make the most optimal choices, since it's more about if we can enable research that are otherwise impossible or too slow. I feel a lot of discussion here can simply wait until the workload arrives and we collect actual data of success rate and throughput. We can start with the most conservative approach and expand eligible hosts if error rate is reasonable but throughput is not enough.
1. Release to 48 or 64GB+ hosts, one task per client. Observe the error rate over a few days. This is likely safe.
2. Release to 32GB hosts, one task per client. Observe the error rate for a few days. This might be a bit risky.
3. Relax the one task per host constraint to better utilize bigger hosts. Ideally a separate option in project preferences in addition to the opt-in. This assumes users either have app_config.xml properly configured on all hosts, or they are running new enough boinc version. This is quite more risky but the throughput increase from large memory hosts can be worth the risk. If CPDN server side can configure a plan class to relax one task per host constraint only for newer clients, it could remove the risk at cost of more complexity.
ID: 71941 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4559
Credit: 19,039,635
RAC: 18,944
Message 71942 - Posted: 26 Jan 2025, 21:17:32 UTC - in response to Message 71940.  

In reply to Glenn Carver's message of 26 Jan 2025:
CPDN can't recommend users download and compile the latest nightly build though, not as a general 'all-users' statement. We'll have to wait.


Agreed or even expect them to download the testing version of BOINC using the instructions to get it via a package manager.

I would guess that it may be possible to at least test the bugfix on the development site ahead of it being rolled out to normal releases.
ID: 71942 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested

©2025 cpdn.org