climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · 26 . . . 42 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68879 - Posted: 8 Jun 2023, 5:17:28 UTC - in response to Message 68877.  

Looking forward to it. Which app would that be?
Weather at Home Windows tasks.


About how much RAM per task will be required?
I can't remember exactly how much they use but 2GB/core is plenty.
ID: 68879 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68880 - Posted: 8 Jun 2023, 10:31:02 UTC - in response to Message 68879.  
Last modified: 8 Jun 2023, 10:31:44 UTC

Weather at Home Windows tasks.

Other people's mileage may vary but I found previously that WaH tasks didn't like being cold restarted (machine on/off) on my Win11 box. They would often abort on restart for reasons I couldn't see in the logs.
ID: 68880 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68881 - Posted: 8 Jun 2023, 13:03:06 UTC - in response to Message 68880.  

Weather at Home Windows tasks.

Other people's mileage may vary but I found previously that WaH tasks didn't like being cold restarted (machine on/off) on my Win11 box. They would often abort on restart for reasons I couldn't see in the logs.

I would second that, though my impression is they fall over on restart slightly less often than the linux Hadley model tasks. Worst were the hadcm3 tasks when these were still available for Linux.
ID: 68881 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,716,561
RAC: 8,355
Message 68882 - Posted: 8 Jun 2023, 13:19:15 UTC - in response to Message 68881.  

I remember that, many years ago, the hadcm3 tasks were supplied with a BOINC graphics app (certainly for Windows, I wasn't running Linux then). The graphics app gave a detailed live display of the precise computational stage reached - and users were advised to avoid stopping or in any way interrupting a task during a checkpoint, or even close to it.

CPDN checkpoint files are large, and on the hardware of the day (mechanical disks) took a significant time to write to disk. Some at least of the failures were put down to incomplete checkpoint files.
ID: 68882 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68883 - Posted: 8 Jun 2023, 14:17:13 UTC - in response to Message 68882.  
Last modified: 8 Jun 2023, 14:18:16 UTC

I remember the graphics too. You could choose whether they were showing precipitation, temperature, air pressure or precipitation on the globe and you could see if your model went off track and had produced an ice world.

Edit: At least one of the 32bit libraries was needed in order to display the graphics.
ID: 68883 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68884 - Posted: 8 Jun 2023, 16:56:17 UTC - in response to Message 68882.  

I remember that, many years ago, the hadcm3 tasks were supplied with a BOINC graphics app (certainly for Windows, I wasn't running Linux then). The graphics app gave a detailed live display of the precise computational stage reached - and users were advised to avoid stopping or in any way interrupting a task during a checkpoint, or even close to it.
Yes, the graphics code is one of the reasons why the change to 64bit is an issue, because it's done via shared memory (and not via files).

If I remember right, it was created with the help of the MetO using an application called pv-wave (I could be wrong, was along time ago). I've been toying with the idea of using a python executable to do a graphics tool for the OpenIFS app but using files rather than via shared memory. It would mean more I/O on the disk though as the output would have to be rearranged to get a nice time display.
ID: 68884 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68885 - Posted: 8 Jun 2023, 17:41:30 UTC - in response to Message 68884.  

I've been toying with the idea of using a python executable to do a graphics tool for the OpenIFS app but using files rather than via shared memory. It would mean more I/O on the disk though as the output would have to be rearranged to get a nice time display.


I am not interested in graphics displays. Some projects offer them, and some do not. IIRC,. I can specify the number of images per second or some such. But after looking at a few, I do not really care to run them. I prefer to use all my spare processor cycles on working on the projects, not projecting movies. If I want to run a movie, I can get them from a local DVD or YouTube or some such.

So, IMAO, go ahead and develop these if you must, but I sure will not be running them.
ID: 68885 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68886 - Posted: 8 Jun 2023, 19:36:10 UTC - in response to Message 68882.  

I remember that, many years ago, the hadcm3 tasks were supplied with a BOINC graphics app (certainly for Windows, I wasn't running Linux then). The graphics app gave a detailed live display of the precise computational stage reached - and users were advised to avoid stopping or in any way interrupting a task during a checkpoint, or even close to it.

CPDN checkpoint files are large, and on the hardware of the day (mechanical disks) took a significant time to write to disk. Some at least of the failures were put down to incomplete checkpoint files.
Shouldn't Boinc allow full writing of checkpoints before closing? Just like a video editor doesn't give up saving a massive video file just because I'm shutting the machine down. So many schoolboy errors!
ID: 68886 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68887 - Posted: 8 Jun 2023, 19:36:43 UTC - in response to Message 68883.  

I remember the graphics too. You could choose whether they were showing precipitation, temperature, air pressure or precipitation on the globe and you could see if your model went off track and had produced an ice world.
But ice ages do exist.
ID: 68887 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68888 - Posted: 8 Jun 2023, 19:38:28 UTC - in response to Message 68885.  

I am not interested in graphics displays. Some projects offer them, and some do not. IIRC,. I can specify the number of images per second or some such. But after looking at a few, I do not really care to run them. I prefer to use all my spare processor cycles on working on the projects, not projecting movies. If I want to run a movie, I can get them from a local DVD or YouTube or some such.

So, IMAO, go ahead and develop these if you must, but I sure will not be running them.
I'm the same, I may look now and again out of interest then get bored and leave it to it. But if it attracts people to run them, it's a good idea. Having said that we're not short of users here, we're short of work!
ID: 68888 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68889 - Posted: 8 Jun 2023, 19:57:41 UTC - in response to Message 68887.  

But ice ages do exist.


In this context, an Ice world refers to a simulation where the whole earth is approaching or has reached 0 degrees K indicating that the model has gone rogue.
ID: 68889 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68890 - Posted: 8 Jun 2023, 20:10:52 UTC - in response to Message 68889.  

But ice ages do exist.
In this context, an Ice world refers to a simulation where the whole earth is approaching or has reached 0 degrees K indicating that the model has gone rogue.
Or the sun went wrong. Crops would be difficult to grow, but we'd manage ;-)
ID: 68890 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68891 - Posted: 8 Jun 2023, 22:11:29 UTC - in response to Message 68886.  
Last modified: 8 Jun 2023, 22:11:48 UTC

CPDN checkpoint files are large, and on the hardware of the day (mechanical disks) took a significant time to write to disk. Some at least of the failures were put down to incomplete checkpoint files.
Shouldn't Boinc allow full writing of checkpoints before closing? Just like a video editor doesn't give up saving a massive video file just because I'm shutting the machine down. So many schoolboy errors!
Boinc doesn't know about checkpointing for cpdn's models. The models do it themselves and ignore whatever the client tells them because there is a risk someone may set the checkpoint frequency too high and cause excessive I/O.

The checkpoint files (or any output file really) can be interrupted by boinc killing the process. For example, opening a file and writing to it are separate steps in the code. We might write temperature, then pressure, then close the file. Each is a separate line of code. If the model process is killed between the open/close then the checkpoint file will be incomplete. There's also buffering to consider before the file is written to a hard device.
ID: 68891 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 68892 - Posted: 8 Jun 2023, 22:19:55 UTC - in response to Message 68885.  
Last modified: 8 Jun 2023, 22:21:48 UTC

I've been toying with the idea of using a python executable to do a graphics tool for the OpenIFS app but using files rather than via shared memory.
I am not interested in graphics displays. Some projects offer them, and some do not. ... I prefer to use all my spare processor cycles on working on the projects, not projecting movies.

So, IMAO, go ahead and develop these if you must, but I sure will not be running them.
Obviously I'm not interested in developing any code for people not planning on using it, but I am for those that are - I'm not sure why it was necessary to tell me you won't be. There are meteorologists like myself, climate scientists, etc though who may well be interested.

I will do the graphics slightly differently. Rather than 1 display for 1 task, it would be 1 display for all tasks. As most people probably know, modern forecasts are all run as forecast ensembles, each starting from a slightly different initial state. We could have a graphics window displaying the same field but from each of the, say, OpenIFS tasks running on the client. Personally I'd find it interesting to see how the forecasts are different and watch those differences develop.
---
CPDN Visiting Scientist
ID: 68892 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68893 - Posted: 8 Jun 2023, 22:36:28 UTC - in response to Message 68891.  

The checkpoint files (or any output file really) can be interrupted by boinc killing the process. For example, opening a file and writing to it are separate steps in the code. We might write temperature, then pressure, then close the file. Each is a separate line of code. If the model process is killed between the open/close then the checkpoint file will be incomplete. There's also buffering to consider before the file is written to a hard device.
What should happen is:
1) User gives command to shut down computer.
2) Boinc commands all running tasks to come to a halt.
3) Boinc waits for all tasks to report they have finished noting things down.
4) Boinc closes.
5) The OS closes.
If 3) isn't happening, somebody made a very stupid decision when coding Boinc. It would be like a foreman not giving a workman 2 minutes to put his tools away.

Obviously I'm not interested in developing any code for people not planning on using it, but I am for those that are - I'm not sure why it was necessary to tell me you won't be. There are meteorologists like myself, climate scientists, etc though who may well be interested.

I will do the graphics slightly differently. Rather than 1 display for 1 task, it would be 1 display for all tasks. As most people probably know, modern forecasts are all run as forecast ensembles, each starting from a slightly different initial state. We could have a graphics window displaying the same field but from each of the, say, OpenIFS tasks running on the client. Personally I'd find it interesting to see how the forecasts are different and watch those differences develop.
Sounds like you're going to make something useful in the graphics. Perhaps Jean-David was referring to some projects that just make a pretty picture with no useful information.
ID: 68893 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68894 - Posted: 9 Jun 2023, 0:01:13 UTC - in response to Message 68893.  

Perhaps Jean-David was referring to some projects that just make a pretty picture with no useful information.


Exactly.
ID: 68894 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,716,561
RAC: 8,355
Message 68895 - Posted: 9 Jun 2023, 7:31:20 UTC - in response to Message 68893.  

2) Boinc commands all running tasks to come to a halt.
BOINC doesn't 'command', BOINC 'requests'.

There are two cases to consider:

a) A planned shutdown under user control, such as for maintenance or upgrading.
b) A forced shutdown by the operating system, such as Windows 11's unfortunate habit of installing updates 'outside working hours'.

For me, a computer outside it's working hours is switched off. A computer running BOINC has working hours of 24/7. So, no automatic updates for me.

The difference between the two cases is that, if the user has initiated the shutdown, it is assumed they are still watching the screen - so dialogs can be displayed, and questions asked and answered. That delays the shutdown for ever if the user has wandered off.

An operating system shutdown is silent - no messages are shown, no operator input is allowed. BOINC still 'requests' science applications to close in an orderly fashion, but if the request is ignored (possibly a programming error by the project), BOINC does indeed 'command' the closedown after a delay - or "terminate with extreme prejudice", in the words of one senior project administrator/programmer.
ID: 68895 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68896 - Posted: 9 Jun 2023, 7:35:06 UTC

Totally agree no point in graphics that are not useful or informative in some way. Of course when the original graphics program was written I don't think any crunchers had multi-core machines so there would have been no point in doing it the way you are Glen!
ID: 68896 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68897 - Posted: 9 Jun 2023, 7:40:20 UTC - in response to Message 68895.  
Last modified: 9 Jun 2023, 7:41:44 UTC

2) Boinc commands all running tasks to come to a halt.
BOINC doesn't 'command', BOINC 'requests'.

There are two cases to consider:

a) A planned shutdown under user control, such as for maintenance or upgrading.
b) A forced shutdown by the operating system, such as Windows 11's unfortunate habit of installing updates 'outside working hours'.

For me, a computer outside it's working hours is switched off. A computer running BOINC has working hours of 24/7. So, no automatic updates for me.

The difference between the two cases is that, if the user has initiated the shutdown, it is assumed they are still watching the screen - so dialogs can be displayed, and questions asked and answered. That delays the shutdown for ever if the user has wandered off.

An operating system shutdown is silent - no messages are shown, no operator input is allowed. BOINC still 'requests' science applications to close in an orderly fashion, but if the request is ignored (possibly a programming error by the project), BOINC does indeed 'command' the closedown after a delay - or "terminate with extreme prejudice", in the words of one senior project administrator/programmer.
If you're right then the fault is with Windows. But I believe people have had problems with CPDN tasks with manual shutdowns.

If it's Windows doing it, Windows should allow all programs including Boinc a decent amount of time to close down. There's no hurry to install an update, it could wait 10 minutes! And it should be able to see things are happening, like disk activity. In fact it does know when things are busy, when I shutdown I get things like "Gridcoin has not safely closed yet". I can then wait and not press "shutdown anyway".

If the problem is with manual shutdowns, why on earth is Boinc allowing the shutdown? I tell the computer to shut down. Boinc tells the projects to shut down. Messages must come back from the projects to Boinc, then Boinc to Windows to say shutting down is completed. Try this: run some VB tasks in Boinc. Now shutdown windows. You'll get the infamous VB message about VB still has open connections. Ignore it, press nothing, eventually you will go back to the desktop and the shutdown will be aborted. So we can see it's possible for programs to prevent the shutdown unless the user overrides it.

The current situation is like me getting a lift in your car, telling you to stop, then immediately getting out of the car before you've told me the car has stopped.
ID: 68897 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,716,561
RAC: 8,355
Message 68898 - Posted: 9 Jun 2023, 7:53:04 UTC - in response to Message 68897.  

Have you never seen the Windows message "waiting for background programs to close" during a shutdown?
ID: 68898 · Report as offensive
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · 26 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org