climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 42 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69567 - Posted: 3 Sep 2023, 6:03:01 UTC

I will say that if there are two different model types from the high memory requirement stable, it is possible to run one of each at the same time without issues. (Assuming you have enough memory to do so!)
ID: 69567 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69568 - Posted: 3 Sep 2023, 6:33:49 UTC - in response to Message 69564.  

kitten
For some reason it's a gun in the game Generation Zero. Maybe it fires kittens.
ID: 69568 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69569 - Posted: 3 Sep 2023, 6:35:26 UTC - in response to Message 69567.  

I will say that if there are two different model types from the high memory requirement stable, it is possible to run one of each at the same time without issues. (Assuming you have enough memory to do so!)
I can't see why there would be a problem. They're seperate programs. It would be like my word processor interfering with my email program.
ID: 69569 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69570 - Posted: 3 Sep 2023, 7:43:01 UTC

I can't see why there would be a problem.
Neither can I but given the number of times I have found problems where I couldn't see why there would be any over the years, it is nice to confirm it in practice.
ID: 69570 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,454,880
RAC: 15,098
Message 69571 - Posted: 3 Sep 2023, 11:53:44 UTC - in response to Message 69570.  

I can't see why there would be a problem.
Neither can I but given the number of times I have found problems where I couldn't see why there would be any over the years, it is nice to confirm it in practice.
One issue is these models move alot of data to/from memory, so the memory bandwidth could saturate slowing down response of other programs (I've seen chrome response slow down for example). The other concern is disk I/O. The hi-mem OIFS models will be writing larger checkpoint (aka restart files) to disk. We need time to tune the model I/O so not to cause problems. Remember these codes are designed to run on High Performance Computers with 1000s of nodes & a fast I/O subsystem.

We will send these out to the wider volunteers but not until we're satisfied they are not going to cause problems for the majority of users who don't read these forums.
ID: 69571 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69572 - Posted: 3 Sep 2023, 12:02:09 UTC - in response to Message 69571.  

One issue is these models move alot of data to/from memory, so the memory bandwidth could saturate slowing down response of other programs (I've seen chrome response slow down for example). The other concern is disk I/O. The hi-mem OIFS models will be writing larger checkpoint (aka restart files) to disk. We need time to tune the model I/O so not to cause problems. Remember these codes are designed to run on High Performance Computers with 1000s of nodes & a fast I/O subsystem.

We will send these out to the wider volunteers but not until we're satisfied they are not going to cause problems for the majority of users who don't read these forums.
I think most of us have SSDs by now. I gave up on rust spinners for anything but backups, security cameras, and TV/Films years ago.
ID: 69572 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69573 - Posted: 3 Sep 2023, 15:10:05 UTC

I think most of us have SSDs by now.
I am guessing that without a reasonably fast NVME drive,some users will notice the slow down. The partition where I have all my BOINC data is on one of these. Next lot of testing, I might try one on the VM which is on the SATA SSD. to see if it slows down browsing response or anything, though whether it will have more effect than the slow down from running a VM is a moot point.
ID: 69573 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69574 - Posted: 3 Sep 2023, 15:29:53 UTC - in response to Message 69573.  

I think most of us have SSDs by now.
I am guessing that without a reasonably fast NVME drive,some users will notice the slow down. The partition where I have all my BOINC data is on one of these. Next lot of testing, I might try one on the VM which is on the SATA SSD. to see if it slows down browsing response or anything, though whether it will have more effect than the slow down from running a VM is a moot point.
Only my older computers have SATA SSDs, the ones without an NVME socket. But they won't be processing fast enough to care.
ID: 69574 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 69575 - Posted: 3 Sep 2023, 19:11:02 UTC - in response to Message 69572.  

I think most of us have SSDs by now. I gave up on rust spinners for anything but backups, security cameras, and TV/Films years ago.


I am guessing that without a reasonably fast NVME drive,some users will notice the slow down.


Well, I do have an NVMe drive on my machine, but the partition for Boinc is on an SATA hard drive. OTOH, the other two partitions on that drive store videos and sound files that I seldom use, and surely I would go at least 8 hours a day without using them at all and my machine runs 24/7 except for occasional system updates. so writing checkpoint files will, at least, not be doing a lot of seeking on that drive.

The other concern is disk I/O. The hi-mem OIFS models will be writing larger checkpoint (aka restart files) to disk. We need time to tune the model I/O so not to cause problems.


IIRC when the Oifs tasks were being sent out early this year, I was running 3 or 4 of those at a time with no problems with computation or even trickle uploads. I do have a 75 megabit/sec Internet connection.
ID: 69575 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69576 - Posted: 4 Sep 2023, 6:38:38 UTC - in response to Message 69575.  

Well, I do have an NVMe drive on my machine, but the partition for Boinc is on an SATA hard drive. OTOH, the other two partitions on that drive store videos and sound files that I seldom use, and surely I would go at least 8 hours a day without using them at all and my machine runs 24/7 except for occasional system updates. so writing checkpoint files will, at least, not be doing a lot of seeking on that drive.
AFAIK seeking should have no effect on an SSD. It's more like RAM, it just sends what you ask for, nothing to physically move.
ID: 69576 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69577 - Posted: 4 Sep 2023, 8:18:22 UTC - in response to Message 69576.  

AFAIK seeking should have no effect on an SSD. It's more like RAM, it just sends what you ask for, nothing to physically move.
Yes, the issue is a potential bottleneck with disk writes and reads leading the CPU waiting. More testing will tell us how much of an issue this is.
ID: 69577 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69578 - Posted: 4 Sep 2023, 9:11:05 UTC - in response to Message 69577.  

AFAIK seeking should have no effect on an SSD. It's more like RAM, it just sends what you ask for, nothing to physically move.
Yes, the issue is a potential bottleneck with disk writes and reads leading the CPU waiting. More testing will tell us how much of an issue this is.
If CPDN sends us all a super fast NVME each, we will be glad to run your program.
ID: 69578 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69579 - Posted: 4 Sep 2023, 9:25:44 UTC

If CPDN sends us all a super fast NVME each, we will be glad to run your program.
Just like other projects do?
ID: 69579 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69580 - Posted: 4 Sep 2023, 10:06:55 UTC - in response to Message 69579.  

If CPDN sends us all a super fast NVME each, we will be glad to run your program.
Just like other projects do?
I was joking. But if it needs better hardware, I buy it. I have stupid amounts of RAM to run LHC for example.
ID: 69580 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,454,880
RAC: 15,098
Message 69581 - Posted: 4 Sep 2023, 11:19:37 UTC - in response to Message 69579.  

If CPDN sends us all a super fast NVME each, we will be glad to run your program.
Just like other projects do?
I run my boinc projects off old spinning hard drives. File I/o is not the bottleneck, core speed is. I/O accounts for roughly 10% of wall clock time for OpenIFS. Probably similar for the Hadley models.
ID: 69581 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69582 - Posted: 4 Sep 2023, 11:48:57 UTC - in response to Message 69581.  
Last modified: 4 Sep 2023, 11:51:04 UTC

I run my boinc projects off old spinning hard drives. File I/o is not the bottleneck, core speed is. I/O accounts for roughly 10% of wall clock time for OpenIFS. Probably similar for the Hadley models.
I thought you were talking about high I/O for CPDN a minute ago?

I stopped using rust spinners for 4 reasons:

All my spinning drives have worn out, except the last few I sold for about £4 each, and bought SSDs for £8 each. At those prices, why not upgrade?

They were too slow for LHC, the program was actually getting impatient and returning an error (but that's Virtualbox for you).

They were agonisingly slow to boot the OS, so any restarts to sort problems were making me want to punch the screen.

If you accidentally go over the RAM limit and go into the pagefile, a rust spinner grinds the computer to a halt, so much so you can't even use the interface to stop the problem.
ID: 69582 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 69583 - Posted: 4 Sep 2023, 15:14:05 UTC - in response to Message 69582.  
Last modified: 4 Sep 2023, 15:15:05 UTC

If you accidentally go over the RAM limit and go into the pagefile, a rust spinner grinds the computer to a halt, so much so you can't even use the interface to stop the problem.


For sure. But my machine has 128 GBytes of RAM and 16 cores, of which 12 are allowed for boinc. Furthermore I set app_config files to limit how many of each type of task is allowed to run. So I do not remember ever using the pagefile for much of anything. Running 24/7 for a little over three days, I seem to be using only one megabyte of pagefile. And that pagefile is on the reasonably fast NVME drive.
top - 10:59:10 up 3 days, 17:18,  2 users,  load average: 12.46, 12.70, 12.54
Tasks: 471 total,  11 running, 460 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us,  5.4 sy, 68.9 ni, 24.7 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem : 128086.0 total,   1183.3 free,   7824.3 used, 119078.5 buff/cache
MiB Swap:  15992.0 total,  15991.0 free,      1.0 used. 118833.2 avail Mem 

ID: 69583 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69584 - Posted: 4 Sep 2023, 15:18:19 UTC - in response to Message 69583.  

For sure. But my machine has 128 GBytes of RAM and 16 cores, of which 12 are allowed for boinc. Furthermore I set app_config files to limit how many of each type of task is allowed to run. So I do not remember ever using the pagefile for much of anything. Running 24/7 for a little over three days, I seem to be using only one megabyte of pagefile. And that pagefile is on the reasonably fast NVME drive.
I turned mine off to save disk space. Got the usual grumble from Windows about it wouldn't be able to store a memory dump when it crashed. Like those are ever helpful.
ID: 69584 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,454,880
RAC: 15,098
Message 69600 - Posted: 7 Sep 2023, 11:20:56 UTC

I understand from talking with CPDN today that the assessment for the modified configuration for the Korean project has been done and the reduced grid produces satisfactory results compared with the results that did actually run from the previous batch. Should get a new batch in next couple of weeks pending one or two more tests.
---
CPDN Visiting Scientist
ID: 69600 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,033,008
RAC: 19,749
Message 69602 - Posted: 7 Sep 2023, 15:18:09 UTC - in response to Message 69600.  

Good to hear Glen. I am guessing these should be OK to run under WINE?
ID: 69602 · Report as offensive
Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org