climateprediction.net (CPDN) home page
Thread 'Big models'

Thread 'Big models'

Message boards : Number crunching : Big models
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62637 - Posted: 7 Aug 2020, 1:52:00 UTC
Last modified: 7 Aug 2020, 1:58:06 UTC

Survey time

Some testing going on, but DON'T get too excited yet.
(Linux at present.)

Question: How do people feel about a monthly upload of around 193Mb?
ID: 62637 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 62638 - Posted: 7 Aug 2020, 4:39:48 UTC - in response to Message 62637.  

Question: How do people feel about a monthly upload of around 193Mb?


1.To be clear, that is model months not real time.
2. Current set up not an issue. When my Ryzen arrives, it could be running up to 16 tasks at a time, finishing them in five or six days, so for last Linux batch sent out which is 4 months, 193x4x16=over 12GB every 5 days. On my bored band at an average of about 60 KB/s upload speed........................

I don't think I will be running 16 of them, probably not more than 4.
ID: 62638 · Report as offensive     Reply Quote
alanb1951

Send message
Joined: 31 Aug 04
Posts: 37
Credit: 9,581,380
RAC: 3,853
Message 62639 - Posted: 7 Aug 2020, 6:32:35 UTC - in response to Message 62637.  

Question: How do people feel about a monthly upload of around 193Mb?

The answer to that rather depends on how long it takes to produce a month's worth of data to upload! If these are going to be models that do several years in a single job, that could be several "months" per real-time day, after all.

And there's another issue that may be critical - checkpointing. If the checkpoints are as frequent as they were on those HadCM3s ones we had last Autumn, folks with ext4 filestore are going to be a tad unhappy! And, of course, if there are too many jobs running at once the machine could become disk-bound if running spinning media rather than solid-state...

(ext4 is more or less the default nowadays, I believe, and as far as I am aware there is no way to avoid more or less immediate writes without turning journaling off, which rather defeats the point... One could work around it by putting [part of] /var/lib/boinc-client on a separate ext3 partition and playing with cache parameters, I suppose, but not everyone is a Linux guru...)

The above said, it's good to know there might be some new work in the pipeline, and perhaps it'll be 64-bit and more tuned to modern hardware???

Cheers - Al.
ID: 62639 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62640 - Posted: 7 Aug 2020, 6:51:57 UTC

Early days, but:

computer: i7-4770
one month model
2 calender days to complete
ID: 62640 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 62641 - Posted: 7 Aug 2020, 7:33:48 UTC - in response to Message 62640.  

OK, but it seems a little small. I will be taking a Ryzen 3600 out of summer lockdown in a month or so, and with a 20 Mpbs (or 2.5 MBps) upload speed, I won't have that much to do.
But give it your best shot, please.
ID: 62641 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 62642 - Posted: 7 Aug 2020, 9:56:32 UTC

The above said, it's good to know there might be some new work in the pipeline, and perhaps it'll be 64-bit and more tuned to modern hardware???


These will still be HADAM4 met office models so 32 bit unfortunately.
ID: 62642 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 62643 - Posted: 7 Aug 2020, 14:35:23 UTC

These are the hadam4h N216 models that have been the main batches we've been running lately on Linux. The ones we've been running on the main site have model month uploads of ~145 MB. In the newly tested version, the model month uploads will be ~195 MB, so about 35% more per upload.
ID: 62643 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 62644 - Posted: 7 Aug 2020, 14:56:59 UTC - in response to Message 62643.  

These are the hadam4h N216 models that have been the main batches we've been running lately on Linux.

Thanks. I limit them to two (or maybe four) per machine for maximum efficiency.
The old ones have run well on a Ryzen 3600 (virtual cores) or i7-9700 (full cores) with that, and I expect the new ones will too.
ID: 62644 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,363,137
RAC: 15,665
Message 62645 - Posted: 7 Aug 2020, 22:18:30 UTC - in response to Message 62644.  
Last modified: 7 Aug 2020, 22:19:31 UTC

I'd probably be limiting to 2 working on my virtual machine. Seems to be most efficient timewise for data output. Broadband speed not a problem as we are on cable here.
ID: 62645 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 62646 - Posted: 8 Aug 2020, 1:54:33 UTC - in response to Message 62637.  
Last modified: 8 Aug 2020, 1:57:17 UTC

193 MB is huge if you are on a satellite or cell phone connection. This is not an issue for me in currently but in the past it would have been a show stopping hurdle. At this point I would support it because I have bandwidth to spare.

To be honest, CPU cycles are what I struggle with these days. I know that climate changes will kill 90% of all life on Earth if we don't stop it. I also know that human civilization as we know it will not survive COVID-19 unless we find multiple ways to treat it's symptoms and immunize against it. If we can't do that then humanity's governments will go insane and climate change will be on our Christmas Wish List. So I have every CPU I own (Two ARM CPU's, one Intel, three AMD) and one more (ARM) that I ordered today working on WCG OpenPandemic tasks. Don't get me wrong. I have a great faith in each and every one of you as a person. It's just that, as a species, I think we a dumber than a bag of hammers.

So until I can stop worrying about the knock on effects of a species so small I can't even it I will only do work for CPDN when there is a a solution to COVID or a lull in the work flow.
ID: 62646 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 62647 - Posted: 8 Aug 2020, 2:11:55 UTC - in response to Message 62646.  

It's just that, as a species, I think we a dumber than a bag of hammers.

We haven't exactly distinguished ourselves.
I do a lot of COVID-19 work, but the anti-virals that will save us near-term are already under test now, and should be available by the end of the year.
The computer studies I think are more relevant for the next-generation viruses. We need to start work now.
ID: 62647 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 62648 - Posted: 8 Aug 2020, 16:30:02 UTC - in response to Message 62646.  

Don't get me wrong. I have a great faith in each and every one of you as a person. It's just that, as a species, I think we a dumber than a bag of hammers.


LOL. I've read this a lot recently (in various forms), and I couldn't agree more. This is especially evident in recent years with the proliferation of many so-called "news" media sources and social media.
ID: 62648 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,363,137
RAC: 15,665
Message 62649 - Posted: 8 Aug 2020, 22:45:50 UTC - in response to Message 62648.  

Possibly more some of the so called leaders at the top and some of the people advising them. Look at what happened in the Spanish flu pandemic in 1918/19.
ID: 62649 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62650 - Posted: 8 Aug 2020, 23:05:15 UTC

Well, it looks like no one is against bigger uploads, so the researchers can go ahead with the current model.
ID: 62650 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 62651 - Posted: 9 Aug 2020, 1:35:45 UTC - in response to Message 62650.  

Will these larger models also use more RAM? If so can we get a hint as to the numbers we should expect?
ID: 62651 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 62652 - Posted: 9 Aug 2020, 2:27:04 UTC - in response to Message 62651.  

Will these larger models also use more RAM? If so can we get a hint as to the numbers we should expect?

Nope, same as the current hadam4h N216 models, about 1.4 GB per task.
ID: 62652 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,706,019
RAC: 5,585
Message 62657 - Posted: 11 Aug 2020, 6:39:57 UTC - in response to Message 62650.  

Well, it looks like no one is against bigger uploads, so the researchers can go ahead with the current model.


What would be the checkpoint interval? I can't recall well, but checkpoint on my i7-4790 was 40-60 mins. Any considerations to reduce it a bit?
ID: 62657 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62658 - Posted: 11 Aug 2020, 9:20:24 UTC

This project has just started, with one short run, and the only matter of interest at present is the extra data being collected, resulting in bigger uploads.
Latter in the year, the other matters raised may get answered.
ID: 62658 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 63063 - Posted: 2 Dec 2020, 9:18:03 UTC

Another batch of OpenIFS went out for testing last night. Upload after 25 minutes is 275MB so longer to upload than compute on my Ryzen which would get interesting if true when it finally makes it to main site and big batches! (I suspect tasks will be much longer with similar sized uploads but am in the land of guessing there!)

Peak memory usage 26% so a bit over 8GB so won't be running them on the old laptop!
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63063 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63067 - Posted: 2 Dec 2020, 16:02:20 UTC - in response to Message 63063.  

All very interesting, but have you been able to run more than one at once?
Is there any obvious slowdown due to it?
ID: 63067 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Big models

©2024 cpdn.org