Message boards : Number crunching : Big models
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Survey time Some testing going on, but DON'T get too excited yet. (Linux at present.) Question: How do people feel about a monthly upload of around 193Mb? |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Question: How do people feel about a monthly upload of around 193Mb? 1.To be clear, that is model months not real time. 2. Current set up not an issue. When my Ryzen arrives, it could be running up to 16 tasks at a time, finishing them in five or six days, so for last Linux batch sent out which is 4 months, 193x4x16=over 12GB every 5 days. On my bored band at an average of about 60 KB/s upload speed........................ I don't think I will be running 16 of them, probably not more than 4. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
Question: How do people feel about a monthly upload of around 193Mb? The answer to that rather depends on how long it takes to produce a month's worth of data to upload! If these are going to be models that do several years in a single job, that could be several "months" per real-time day, after all. And there's another issue that may be critical - checkpointing. If the checkpoints are as frequent as they were on those HadCM3s ones we had last Autumn, folks with ext4 filestore are going to be a tad unhappy! And, of course, if there are too many jobs running at once the machine could become disk-bound if running spinning media rather than solid-state... (ext4 is more or less the default nowadays, I believe, and as far as I am aware there is no way to avoid more or less immediate writes without turning journaling off, which rather defeats the point... One could work around it by putting [part of] /var/lib/boinc-client on a separate ext3 partition and playing with cache parameters, I suppose, but not everyone is a Linux guru...) The above said, it's good to know there might be some new work in the pipeline, and perhaps it'll be 64-bit and more tuned to modern hardware??? Cheers - Al. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Early days, but: computer: i7-4770 one month model 2 calender days to complete |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
OK, but it seems a little small. I will be taking a Ryzen 3600 out of summer lockdown in a month or so, and with a 20 Mpbs (or 2.5 MBps) upload speed, I won't have that much to do. But give it your best shot, please. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
The above said, it's good to know there might be some new work in the pipeline, and perhaps it'll be 64-bit and more tuned to modern hardware??? These will still be HADAM4 met office models so 32 bit unfortunately. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
These are the hadam4h N216 models that have been the main batches we've been running lately on Linux. The ones we've been running on the main site have model month uploads of ~145 MB. In the newly tested version, the model month uploads will be ~195 MB, so about 35% more per upload. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
These are the hadam4h N216 models that have been the main batches we've been running lately on Linux. Thanks. I limit them to two (or maybe four) per machine for maximum efficiency. The old ones have run well on a Ryzen 3600 (virtual cores) or i7-9700 (full cores) with that, and I expect the new ones will too. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,363,137 RAC: 15,665 |
I'd probably be limiting to 2 working on my virtual machine. Seems to be most efficient timewise for data output. Broadband speed not a problem as we are on cable here. |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
193 MB is huge if you are on a satellite or cell phone connection. This is not an issue for me in currently but in the past it would have been a show stopping hurdle. At this point I would support it because I have bandwidth to spare. To be honest, CPU cycles are what I struggle with these days. I know that climate changes will kill 90% of all life on Earth if we don't stop it. I also know that human civilization as we know it will not survive COVID-19 unless we find multiple ways to treat it's symptoms and immunize against it. If we can't do that then humanity's governments will go insane and climate change will be on our Christmas Wish List. So I have every CPU I own (Two ARM CPU's, one Intel, three AMD) and one more (ARM) that I ordered today working on WCG OpenPandemic tasks. Don't get me wrong. I have a great faith in each and every one of you as a person. It's just that, as a species, I think we a dumber than a bag of hammers. So until I can stop worrying about the knock on effects of a species so small I can't even it I will only do work for CPDN when there is a a solution to COVID or a lull in the work flow. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
It's just that, as a species, I think we a dumber than a bag of hammers. We haven't exactly distinguished ourselves. I do a lot of COVID-19 work, but the anti-virals that will save us near-term are already under test now, and should be available by the end of the year. The computer studies I think are more relevant for the next-generation viruses. We need to start work now. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Don't get me wrong. I have a great faith in each and every one of you as a person. It's just that, as a species, I think we a dumber than a bag of hammers. LOL. I've read this a lot recently (in various forms), and I couldn't agree more. This is especially evident in recent years with the proliferation of many so-called "news" media sources and social media. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,363,137 RAC: 15,665 |
Possibly more some of the so called leaders at the top and some of the people advising them. Look at what happened in the Spanish flu pandemic in 1918/19. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Well, it looks like no one is against bigger uploads, so the researchers can go ahead with the current model. |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
Will these larger models also use more RAM? If so can we get a hint as to the numbers we should expect? |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Will these larger models also use more RAM? If so can we get a hint as to the numbers we should expect? Nope, same as the current hadam4h N216 models, about 1.4 GB per task. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,706,019 RAC: 5,585 |
Well, it looks like no one is against bigger uploads, so the researchers can go ahead with the current model. What would be the checkpoint interval? I can't recall well, but checkpoint on my i7-4790 was 40-60 mins. Any considerations to reduce it a bit? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This project has just started, with one short run, and the only matter of interest at present is the extra data being collected, resulting in bigger uploads. Latter in the year, the other matters raised may get answered. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Another batch of OpenIFS went out for testing last night. Upload after 25 minutes is 275MB so longer to upload than compute on my Ryzen which would get interesting if true when it finally makes it to main site and big batches! (I suspect tasks will be much longer with similar sized uploads but am in the land of guessing there!) Peak memory usage 26% so a bit over 8GB so won't be running them on the old laptop! Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
All very interesting, but have you been able to run more than one at once? Is there any obvious slowdown due to it? |
©2024 cpdn.org