climateprediction.net (CPDN) home page
Thread 'New Work Announcements 2024'

Thread 'New Work Announcements 2024'

Message boards : Number crunching : New Work Announcements 2024
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 13 · Next

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70257 - Posted: 31 Jan 2024, 22:42:25 UTC - in response to Message 70253.  

The aim is to see how much variation we get in running multiple identical forecasts across all the linux machines attached to CPDN, and, if we get the same result from exact same forecasts from each host (which is not a given).
Interesting. There... shouldn't be any variation in results for the same code on the same host with the same initial conditions. If so, look for uninitialized memory reads somewhere, I guess? I know floating point is messy, but it should at least be consistently messy.
There shouldn't be but it does happen. I've seen a bug in the intel maths library once which caused differences. I forget the details now as it was some time ago, but I vaguely remember it was related to the way it handled memory if vector lengths didn't fit entirely into cache caused summing numbers in different orders. Anyway, it's worth checking.
---
CPDN Visiting Scientist
ID: 70257 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 70261 - Posted: 1 Feb 2024, 19:54:08 UTC

Do we need to set up some parameters on Linux boxes, to avoid downloading and running too many OIFS at the same time?
ID: 70261 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 70262 - Posted: 1 Feb 2024, 20:07:04 UTC - in response to Message 70261.  

Do we need to set up some parameters on Linux boxes, to avoid downloading and running too many OIFS at the same time?


Certainly not on any machine with a reasonable amount of memory as there will be a limit of either one or two from the server. In theory there could be problems with a machine with just 16GB getting two at once if also running other tasks if the project limit is 2 rather than 1 but the majority of machines should be fine. The major problems as always will come not from those who read the noticeboards but from the set and forget brigade.
ID: 70262 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,744,071
RAC: 63,130
Message 70263 - Posted: 1 Feb 2024, 22:25:59 UTC - in response to Message 70262.  

The major problems as always will come not from those who read the noticeboards but from the set and forget brigade.

Will there be a preference setting that one can override for people that actively monitor the output and have bigger machines?
ID: 70263 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70264 - Posted: 1 Feb 2024, 22:36:08 UTC - in response to Message 70263.  

What kind of preference setting were you thinking of?
The major problems as always will come not from those who read the noticeboards but from the set and forget brigade.

Will there be a preference setting that one can override for people that actively monitor the output and have bigger machines?

---
CPDN Visiting Scientist
ID: 70264 · Report as offensive     Reply Quote
Yeti

Send message
Joined: 5 Aug 04
Posts: 178
Credit: 18,746,186
RAC: 44,617
Message 70265 - Posted: 1 Feb 2024, 22:59:14 UTC - in response to Message 70257.  

There shouldn't be but it does happen. I've seen a bug in the intel maths library once which caused differences. I forget the details now as it was some time ago, but I vaguely remember it was related to the way it handled memory if vector lengths didn't fit entirely into cache caused summing numbers in different orders. Anyway, it's worth checking.

Yes, if you want to get more info about this, I remember that this was a huge point for the guys at LHC@Home from Sixtrack application, especially Ben Segal. Perhaps you can discuss with them about this special theme.

Further on I guess this is the reason, why they run all other projects only within Linux-native or Linux-VMs
Supporting BOINC, a great concept !
ID: 70265 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,744,071
RAC: 63,130
Message 70266 - Posted: 2 Feb 2024, 0:38:00 UTC - in response to Message 70264.  
Last modified: 2 Feb 2024, 0:38:23 UTC

What kind of preference setting were you thinking of?

I read "there will be a limit of either one or two from the server" as even if someone's cpdn project preference set max# of jobs to "no limit", they will still get limited to 1 or 2 per host for OpenIFS tasks. So I wonder how one can get more tasks for the host if it has a lot of memory, without doing multi-client or VMs.

If I remembered wrong and the default max# of jobs in preference is 1 or 2 and you will continue to honor that setting if it's set to "no limit", then whatever setting I was asking for already exists.
ID: 70266 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70267 - Posted: 2 Feb 2024, 2:18:02 UTC - in response to Message 70266.  
Last modified: 2 Feb 2024, 2:18:47 UTC

What kind of preference setting were you thinking of?
I read "there will be a limit of either one or two from the server" as even if someone's cpdn project preference set max# of jobs to "no limit", they will still get limited to 1 or 2 per host for OpenIFS tasks. So I wonder how one can get more tasks for the host if it has a lot of memory, without doing multi-client or VMs.

If I remembered wrong and the default max# of jobs in preference is 1 or 2 and you will continue to honor that setting if it's set to "no limit", then whatever setting I was asking for already exists.
I second this, I have machines with up to 128GB RAM. Limiting those to the same number of tasks as 16GB machines is illogical.
ID: 70267 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 70268 - Posted: 2 Feb 2024, 2:42:14 UTC - in response to Message 70267.  
Last modified: 2 Feb 2024, 2:43:30 UTC

I second this, I have machines with up to 128GB RAM. Limiting those to the same number of tasks as 16GB machines is illogical.

Me too. My Windows 10 machine has about 16 GBytes of RAM (total), but my Linux machine has 128 GBytes of RAM.

The Linux box has a 16-core processor and I am letting up to 13 Boinc tasks run at a time. In warm weather I first cut it down to 12 Boinc tasks and when it is really too hot, I cut it down to 8. I run CPDN, WCG, DENIS, Rosetta, Einstein, Universe in order of decreasing priority.
ID: 70268 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70269 - Posted: 2 Feb 2024, 3:47:00 UTC - in response to Message 70268.  

The Linux box has a 16-core processor and I am letting up to 13 Boinc tasks run at a time. In warm weather I first cut it down to 12 Boinc tasks and when it is really too hot, I cut it down to 8. I run CPDN, WCG, DENIS, Rosetta, Einstein, Universe in order of decreasing priority.
Apologies for going off track here, but there is never a reason for a CPU to be too hot. Improve the cooling system. 17 W/mK heatsink paste, bigger cooler, faster fan, etc.
ID: 70269 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 70270 - Posted: 2 Feb 2024, 4:36:03 UTC - in response to Message 70269.  

Apologies for going off track here, but there is never a reason for a CPU to be too hot. Improve the cooling system. 17 W/mK heatsink paste, bigger cooler, faster fan, etc.


My fans increase in speed as the box temperature, processor heat sink, etc., increase in temperature. But they do not increase fast enough, so I have diddled the BIOS to run the fans faster. But I have them set up so they make so much noise that I can't stand to run them any faster. There is no room in the box for a bigger processor heat sink.

These are how my system is running at the moment. Ambient air temperature is 74F
$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +76.0°C  (high = +88.0°C, crit = +98.0°C)
Core 8:        +69.0°C  (high = +88.0°C, crit = +98.0°C)
Core 2:        +67.0°C  (high = +88.0°C, crit = +98.0°C)
Core 3:        +71.0°C  (high = +88.0°C, crit = +98.0°C)
Core 5:        +65.0°C  (high = +88.0°C, crit = +98.0°C)
Core 1:        +67.0°C  (high = +88.0°C, crit = +98.0°C)
Core 9:        +70.0°C  (high = +88.0°C, crit = +98.0°C)
Core 11:       +76.0°C  (high = +88.0°C, crit = +98.0°C)
Core 12:       +65.0°C  (high = +88.0°C, crit = +98.0°C)

amdgpu-pci-6500
Adapter: PCI adapter
vddgfx:       +0.96 V  
fan1:        2086 RPM  (min = 1800 RPM, max = 6000 RPM)
edge:         +45.0°C  (crit = +97.0°C, hyst = -273.1°C)
PPT:          10.04 W  (cap =  25.00 W)

dell_smm-virtual-0
Adapter: Virtual device
fan1:        4325 RPM
fan2:        1373 RPM
fan3:        3496 RPM

ID: 70270 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 70271 - Posted: 2 Feb 2024, 5:49:24 UTC

I second this, I have machines with up to 128GB RAM. Limiting those to the same number of tasks as 16GB machines is illogical.


Though not illogical for the forthcoming batch which is looking at variance between machines. For this batch I assume that to get as many different machines involved as possible is the aim.
ID: 70271 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70272 - Posted: 2 Feb 2024, 6:07:46 UTC - in response to Message 70270.  
Last modified: 2 Feb 2024, 6:09:50 UTC

These are how my system is running at the moment.
Most CPUs are fine up to 95C, and will auto-throttle at that temperature. I have an old Xeon server where one of the CPUs hangs around 95C. Some days it will throttle a little, but I don't care. The CPU stops itself getting damaged. You don't need to manually adjust Boinc. Crank the fans as high as you want for noise, then let it work as hard as it can - for example you could set the fans in the BIOS never to exceed 70% speed. The heatsink paste can make a serious difference - I've made graphics cards 20C cooler, and that takes up no space. Although you could get a larger case, I always use full towers.
ID: 70272 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70276 - Posted: 2 Feb 2024, 11:38:14 UTC - in response to Message 70266.  

We need to change the default on that project setting of 'default max # jobs' from 'no limit' to 1 or 2. Otherwise we're back where we started. Thanks for mentioning that.

If I remember right, the server is (at the moment) still set to only deliver 1-2 WUs per host for OIFS. I will check. We had lots of problems with OIFS on lower memory machines, with complaints and volunteers dropping out the project, which we don't want. There are nearly 1000 linux volunteer hosts, most of which don't read the forums, so we have to come up with a configuration that works for the majority, learn, then start catering for the people with the bigger machines.

I think this is the right approach, particularly as we have not yet rolled out the multicore, higher resolutions which will take upwards of 20Gb RAM. I want to see how the community respond to these tasks first.

What kind of preference setting were you thinking of?

I read "there will be a limit of either one or two from the server" as even if someone's cpdn project preference set max# of jobs to "no limit", they will still get limited to 1 or 2 per host for OpenIFS tasks. So I wonder how one can get more tasks for the host if it has a lot of memory, without doing multi-client or VMs.

If I remembered wrong and the default max# of jobs in preference is 1 or 2 and you will continue to honor that setting if it's set to "no limit", then whatever setting I was asking for already exists.

---
CPDN Visiting Scientist
ID: 70276 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70279 - Posted: 2 Feb 2024, 12:03:10 UTC - in response to Message 70276.  
Last modified: 2 Feb 2024, 12:03:30 UTC

Has the default always been no limit and mine isn't (on no limit because I chose to do so)? So this would make every single person go down to 1-2 cores. Fair enough, but there will be a lot of people with powerful machines who don't read here. Perhaps a notification which appears in Boinc to say "put this up again if you have x GB RAM"?

Even better would be the server intelligently limiting tasks per GB.
ID: 70279 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 70280 - Posted: 2 Feb 2024, 12:08:19 UTC - in response to Message 70279.  

Has the default always been no limit and mine isn't (on no limit because I chose to do so)? So this would make every single person go down to 1-2 cores. Fair enough, but there will be a lot of people with powerful machines who don't read here. Perhaps a notification which appears in Boinc to say "put this up again if you have x GB RAM"?

Even better would be the server intelligently limiting tasks per GB.


Default has been no limit except for I think one recent batch of high memory demand OIFS tasks. I agree that setting the limit according to the memory available would be a good idea but someone would have to request that feature over at git-hub. (I haven't checked to see if such a request has been made and if so what the response was.)
ID: 70280 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70283 - Posted: 2 Feb 2024, 12:22:09 UTC - in response to Message 70280.  

Default has been no limit except for I think one recent batch of high memory demand OIFS tasks.
As long as it's user-changeable. At the moment there is only one setting, and mine is on no limit. There would need to be a setting for each type, as there's no point in limiting other types of task like WAH2.
ID: 70283 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 70284 - Posted: 2 Feb 2024, 12:29:46 UTC - in response to Message 70283.  

As long as it's user-changeable. At the moment there is only one setting, and mine is on no limit. There would need to be a setting for each type, as there's no point in limiting other types of task like WAH2.


It isn't user changeable via the website like WCG have. This is purely server side and the limitation will only be on the tasks that demand a high memory. And presumably a limit of one for the batch to check if all hosts provide the same results.
ID: 70284 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70285 - Posted: 2 Feb 2024, 12:31:50 UTC - in response to Message 70284.  

This needs to be a new thread if we are going to discuss user/project preferences at length.
ID: 70285 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70286 - Posted: 2 Feb 2024, 12:47:08 UTC

I think Glenn ought to have moderator priveledges to move things around.
ID: 70286 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 13 · Next

Message boards : Number crunching : New Work Announcements 2024

©2024 cpdn.org