Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
My 3.50GHz Haswell looks like taking about 14 days for these, even though BOINC is saying about 3.3 days. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,020,584 RAC: 20,684 |
My 3.50GHz Haswell looks like taking about 14 days for these, even though BOINC is saying about 3.3 days. Similar percentage difference here. If the figure in the task files that determines the estimate is the same one one that determines credit, may need to mention this to the project? |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
The estimates for my i7-8700 are a bit strange. If you just add the Elapsed Time and Time Left, you get about 5 days. But if you look at the % completed (only 6.1%), it comes out to about 26 days. Normally that means the "Time Left" is wrong, and will adjust itself in due course by slowing down. But at the moment, it is still decreasing in real time. Eventually, one or the other will change to more consistent values. The "% completed" could be non-linear, and the final result somewhere in between. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
If you're running on the hyper cores, then it may be that. One of the researchers said some time back that doing that results in a lot of switching in the processor. I guess the code has something that likes/needs "real" cores. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Yes, it is on hyper cores. I can do real cores next, with a bit of memory juggling. I wanted to use my i7-9700 (8 real cores) anyway, but found that it was not stable with 64 GB of memory, at least not at the rated speed. But I now have new memory that might be more compatible. Or at least I can run the i7-8700 on real cores if need be. It should be ready by Christmas. EDIT: That much memory is not needed now, but I am planning for the OpenIFS. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've just noticed something interesting. One of the 4 models running, which are batch 842, is now 35 minutes behind the others. Also about 0.15% behind. It was the last to start, about 1 minute behind the 3rd one to start. This is my "general use computer", and I've noticed it's slow to react, or even frozen for a few seconds. 11.5 hours until the first lot of zips. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,020,584 RAC: 20,684 |
This is my "general use computer", and I've noticed it's slow to react, or even frozen for a few seconds. I have noticed this on my slow general use computer. But mine only has 2GB/core which really isn't enough if much else is running at the same time with these tasks. I have restricted it to just one of the two cores which has sorted that out. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Dave and Les - I stumbled across a Linux package called xosview which shows some cool information about memory usage, paging, cache, and if a cpu is in a wio (waiting for I/O) state. Maybe you already knew of it. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I stumbled across a Linux package called xosview which shows some cool information about memory usage, paging, cache, and if a cpu is in a wio (waiting for I/O) state. https://sourceforge.net/projects/xosview/ I have not run this one. http://xosview.sourceforge.net/ |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
One thing I have learned by monitoring the writes is that enabling or disabling hyper-threading on my i7-8700 has no effect. The write rate stays exactly the same at 33.5 GB/day, on either six full cores or twelve virtual cores. So the total work output would be the same over a period of time in either case. So you might as well save memory and operate on six full cores, or in other words just set BOINC to run on 50% of the available cores. As for the times, that is still a bit of a mystery and I won't know until I complete some under a given set of circumstances, but probably around 13 days on full cores and twice that on virtual cores. My i7-9700 should do better, but it is still early. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Finally! My first lot of zips have shown up. A bit over 137 Megs. |
Send message Joined: 14 Mar 15 Posts: 1 Credit: 970,308 RAC: 12,438 |
Do you have a figure for time between checkpoints to disk? I guess 2 and a half hours? Preferable to keep those tasks in memory and not shut down too often... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You can work out the figure for yours from the BOINC Properties list. Click on a model in the Tasks tab, then click on Properties to the left. A third of the way down the list is the time of the last checkpoint. Start writing down/watching, and soon you'll get what you want. ****************** Yes, these models are big, so the longer a computer can be left running the better. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
WB8ILI No, but then I haven't gone looking for anything. I just leave them to get on with it. Mostly, anyway. I'll have a look at that program later. Jean Thanks for the link. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Do you have a figure for time between checkpoints to disk? I guess 2 and a half hours? Preferable to keep those tasks in memory and not shut down too often... You can enable "Checkpoint debug" under "Event Log Diagnostic Flags" or "Event Log Options" (depending on version of boinc). You can get at that from the "Advanced" or "Options" menu of boinc manager (also depending on the version of boinc). Of course this is probably useless for the hadcm3s models which checkpoint much, much more frequently. Keeping these big models in memory and not interrupting them frequently is definitely the right idea. My Ryzen 3600X running 4 models checkpoints about every 66 minutes per task. My i7-4970K does so every 106 minutes, also running 4 at a time. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
TL;DR - you probably don't want to run more than one of these per 4MB+ of L3 cache... Jim1328's time estimates for an i7-8700 prompted me to do some tests (see below) as my experience with the Microbiome application (MIP1) at WCG, which is also a memory hog, suggests that one should only run one instance of that per 4MB (or more[1]) of L3 cache; running more results in significant increases in cache misses, with a corresponding drop in overall CPU effectiveness (for any BOINC tasks running, not just the hogs!) -- indeed, running 4 at a time on a machine with 8MB cache resulted in CPU temperatures dropping by 10C or more and run times nearly double that of a single task (which I restricted using the max_concurrent mechanism) Testing on an i5-7600 (6MB L3 cache, 4 cores, no hyper-threading, 8GB RAM, 3.5GHz clock) has shown HadAM4@N216 to be a cache-wrecker as well (no surprise there). I did tests with 1 HadAM4 task, 2 HadAM4 tasks, 3 HadAM4 tasks, and my normal workload if I have a CPDN task - 1 CPDN, 2 WCG. Running a single HadAM4 task with no company yields a checkpoint every 81 minutes; running two at once yields checkpoints every 91 minutes; running three, checkpoints are about 110 minutes apart. This is consistent with changes in the number of instructions run in a fixed time interval, which I monitored with the perf stat command. As checkpoints seem to be taken once per model day and there are about 120 days per 4-month model I'd reckon these would complete in about 6.8 days (running 1 at a time), 7.6 days (running 2 at a time) or 9.2 days (3 at a time). By the way, under my usual workload [avoiding MIP1 tasks as they mess up the cache too!], checkpoints are about 83 minutes apart, so it can be seen that the WCG tasks aren't really getting in the way. (If MIP1 tasks get in there, the checkpoints are about 86 minutes apart.) There's one thing in favour of running lots of these on a multi-core machine - your power draw will drop (as evidenced by CPU temperatures!) as the cores end up waiting for memory accesses more and more often! But I suspect there comes a point where each task takes so long to run that it's just not worth it - I, for one, will continue to treat CPDN as minority work on my Intel machines in order to maximize throughput. I'm about to take delivery of a Ryzen 3700X (32MB L3 cache, though I gather access is constrained to 8MB per 2 cores (4 threads)); I'll be interested to see how that behaves as and when it gets some CPDN work to do (and will probably do some bulk tests with WCG MIP1 to get an idea if there's no CPDN work available!) Cheers - Al. [1] Someone over at WCG seemed to think 5MB cache was what a MIP1 job would like. The user offered no justification for that number but 4MB probably isn't enough for near-optimum performance. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
The cache size definitely makes a difference as to how much the model speed slows down when loading more on. My 4790K has 8 MB of L3 cache and can run 1 N216 model at 13.9 sec/TS and 4 at 22 sec/TS. (58% slower) My 3600X has 32 MB of L3 cache and can run 1 N216 model at 11.2 sec/TS and 4 at 13.6 sec/TS. (21% slower) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's also this page: Xosview for downloading it in a terminal window. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It looks like the cache is the culprit.. This will slow down those 64 and 128 core machines. Unless they're just crashing them because of the missing lib. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I'm about to take delivery of a Ryzen 3700X (32MB L3 cache, though I gather access is constrained to 8MB per 2 cores (4 threads)); I'll be interested to see how that behaves as and when it gets some CPDN work to do (and will probably do some bulk tests with WCG MIP1 to get an idea if there's no CPDN work available!) Thanks a lot for the cache info. I was beginning to think that the issues were deeper than I had found. I just happen to have a Ryzen 3700x, and was wondering what its large L3 cache would do here. But I would need to add more memory. So let us know, and I could do it. EDIT: I have found that as I add more N216 to my i7-9700, the run time estimates increase, as manually calculated. The first one was 5.5 days, and the last one is now 15.5 days. So the cache is implicated, since they are all full cores and so hyper-threading is not an issue. (As for MIP1, I have found that I need to limit it to two running at a time on any of my machines - Intel or AMD. Cache could certainly play a role, or how it is accessed.) |
©2024 cpdn.org