|
Questions and Answers : Windows : Optimise PC build for CPDN
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
I agree with Eirik, assuming he means UPS - uninterruptible power supply. That would probably have a bigger effect on the number of failures than ECC RAM versus non-ECC. About placement - I think it's fine to put Boinc (programs) on the system disk. The CPDN programs live in the data folder, though, IIRC. If you have lots of spare disks you could consider putting the paging file on its own disk - although nowadays with RAM relatively cheap, paging is less common than it used to be. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
UPS. I have surge protection and have considered UPS in the past, the thing putting me off being the loss in efficiency of around 10% in a typical situation. It would help smooth out voltages, but my monitoring shows voltages from 217-238VAC on the supply (within the NZ nom 230V +/-6%), and well within the capacity of any computer power supply. We do have power cuts (say 2/yr), but the level of model failure is a far higher this. So, if there are other reasons for using a UPS, I'm happy to be convinced. Oh, and big yes to the other hints about OS updates etc and that is what I practise. Buying a new PC is doing my head in and getting in the way of doing work!!! Prices for a Xeon system are high, but affordable, but getting info out of companies is like getting blood out of a stone. One thing I was wondering was would I be better off going for a slower 8 core or a fast 6 core, given that I will allow CPDN to run on half the cores. Like I said I'm not really into the RAC race. HP priced a 6core 3.2GHz Xeon, but I could, for a mere small fortune, go to an 8 core 2.6 or 2.4GHz Xeon. My gut feeling is that the total work unit throughput would be higher with a slower 8 core. Anyone any data, or are we getting in the relms of does it really matter at this point? All systems have been priced with ECC memory. Pagefile. Reading up on this earlier, even with loads of ram, I believe it is still written to disc for protection and things like hibernation. Like I said below, even with 12GB or ram its activity level is pretty high. But good advice and yes I'm going to put it on a separate HDD. |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
[Eirik Redd wrote:]... right now Darwin runs fastest (but maybe less accurately) ...I don't know of any suggestion that accuracy varies between platforms. I have certainly found differences in reproducibility of results between platforms (i.e. what would be 'validity', if CPDN validated), but even that doesn't equate to accuracy given the CPDN mantra of "output variations are equivalent to random input variations". It would surprise me if all run-times were equivalently accurate in an absolute sense (e.g. in computing a 'log', 'cos' or 'sqrt' function), but whether that translates into an identifiably more accurate final model state is quite another matter. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
This is kinda techie - but - on Linux - Ubuntu-12.04 anyhow - hadcm3n_6.07_i686-pc-linux-gnu depends on a shared library >>libm.so.6 => /lib/i386-linux-gnu/libm.so.6 that is provided by the host. This is the math library that is well-documented - but the compiler has several options on how to call the host math libs -- including, among others --fast-math which takes shortcuts to run faster or the much slower option for IEEE something-or-other which guarantees last-decimal accuracy. I expect the compile is similarly dependent on the Windows and Darwin hosts -- and what I remember from the extreme testing a few years ago to get the compile to work all all varieties of hosts -- so it all depends on compiler options and on what math capabilities the host, and it's dll's - has. And some of the math libs figure out what host they are on and optimize (or short-cut if you see it that way) But -- these are all really minor variations - I've looked up and seen that on some hardware, software, dll's -- sometimes a model will fail a few timesteps earlier on on host with different math libs vs another with different math libs. Overall - there is variation dependent on platform and specific versions of mathlibs -- but -- not much difference in overall results. |
![]() ![]() Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
UPS. I have surge protection and have considered UPS in the past,... I used to run a UPS a few years ago ... at the time I was getting something like 8 power cuts / month (usually just a second or two). But the supply has dramatically improved and it doesn't seem to be needed now. ... HP priced a 6core 3.2GHz Xeon, but I could, for a mere small fortune, go to an 8 core 2.6 or 2.4GHz Xeon. My gut feeling is that the total work unit throughput would be higher with a slower 8 core. ... Why not get the cheapest 6-core now, and get an 8-core in a couple of years when they are cheaper? I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
There is another option or two when addressing the high writes to an SSD, which I learned when dealing with the similarly high writes of the CEP2 project on World Community Grid. The one I use at the moment is to place the BOINC data folder on a ramdisk. I currently use PrimoRamdisk (from Romex Software) for its relatively fast startup and shutdown, since you need to save and then reload the contents of the ramdisk each time you reboot. Another option is Dataram RAMDisk, which has a free version for disk sizes less than 4 GB, which should be plenty for most BOINC projects. The second option is to use a caching program with a write-cache (the read cache is unnecessary for SSDs, but could be helpful for a mechanical disk drive). I have used FancyCache (also from Romex), which has a free beta at the moment. If you set the write cache to maybe 1 GB or so and set the latency to an hour or more (I usually used 24 hours), you get a very large (e.g., 99%) reduction in writes to the disk. But remember you are then storing the BOINC data in main memory, so if you get a crash you lose it, at least in the case of a ramdisk. In the case of a write-cache, you lose everything in the cache from the time of the last cache flush to disk, but you still retain the basic data so it is easier to recover from. I use an uninterruptable power supply (UPS) with automatic software shutdown of the PC to prevent loss in case of a power outage, and I have a very stable PC. If you overclock and crash a lot, these are not good options. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Mike suggested:
Well, I guess that comes down to why I'm running CPDN in the first place. But basically I believe the CPDN community needs answers sooner rather than later. I've been working on energy issues since 1985 for a variety of organisations ranging from low income communities to some of the largest corporations. During this time, my driving force has been energy 'conservation' (to quote an old fashioned term), but over the last 10 years the issues of climate change came to be a significant motivator. I was a late starter on that one! I could go on forever on this one, but not the point of this thread. So, I decided that although the 8 core is disproportionally expensive, I can afford the system, it will not use significantly more energy (the Xeon processor is actually rated at lower wattage than my old i7), and as a business PC it gets written off against tax. But I would have done it anyway. :-) UPS. Still undecided on that one, but as the Met Service is predicting the heaviest snow for the last 20 years in the next few days, perhaps I should think again. Almost certain to lose power, which is great, cause it means you can sit around the log burner reading the odd book or two. Oh, and throw snowballs. ;-)) Just for the record, I'm going for the following built by a local company specialising in server builds, with the processor making up nearly 50% of the cost. Given the cost, decided not to build it myself, and they provide a 3 yr guarantee.
LGA 1120 motherboard (of course) Intel 520 series SSD for OS & programs (incl BOINC), plus a second one as a working scratch disk for non-BOINC work.) Various work HDDs 32GB ECC 1600GHz ram (overkill I know, but it's cheap although have to wait till the end of the month before the memory arrives in the country! Geez NZ is a hick place at times.) CPDN data files on their own HHD Page file on a separate HDD (one of the work data drives probably) BOINC will run as service - as I always have. Means I can log out and leave CPDN running. Nightly backups. But this excludes any CPDN data, and the BOINC service is automatically stopped and restarted for this activity. Will allow CPDN to run on 50% of the cores. Hyperthreading? Probably, as I don't recall spotting this causing problems, although it only seems to give small advantage. No overclocking. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Jim said I use an uninterruptable power supply (UPS) with automatic software shutdown of the PC to prevent loss in case of a power outage..... With the auto shutdown, can you set the software to do certain tasks, e.g. stop the BOINC service (when running BOINC as a service of course), or shuting down the Std BOINC program before shutting down the OS? I ask as there has been previous thoughts that you should always shut down BOINC before shutting down the OS, as the OS does not always allow BOINC sufficient time to shutdown the CPDN threads safely. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
With the auto shutdown, can you set the software to do certain tasks, e.g. stop the BOINC service (when running BOINC as a service of course), or shuting down the Std BOINC program before shutting down the OS? I don't see any specific provisions for shutting down particular programs in either my APC PowerChute or my CyberPower PowerPanel software that initiates the PC shutdown in case of a power outage. But surely (?) you can shut down your PC from the Start button in Windows without problem. At least it has always worked fine for me shutting down BOINC when I am running WCG/CEP2 (and also Folding@home), but I don't have any specific experience on power outages with CPDN. I just started up CPDN again, and the thunderstorm season has yet to do much damage here. But I have certainly rebooted the PC manually with no problem, and it should be the same thing. I wonder if it depends on your OS and/or disk drive though. Win7 64-bit and a reasonably fast Samsung SSD work for me. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I should mention before getting too far down the road that one potential problem with a ramdisk is that the size of the CPDN files (in the BOINC data folder) keep getting larger. That can be accommodated for by choosing the ramdisk large enough to begin with if you have enough main memory insofar as the operation of a given set of work units is concerned. The real problem comes later, since according to the FAQ, the various projects do not clean out all their stuff, but leave it there, to varying degrees, for possible later use. Unless you want to delete old files at the end of every run, it looks like a better solution is the cache (e.g., FancyCache). You could set the write cache to a large enough size (a few GB) to handle the work in progress, and only the remainder would get written to the disk drive when the cache is flushed. That is probably my next project. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Unfortunately the re-write of the FAQ pages wasn't completed before the new front pages went live. Most of it still applies to 2004 when the project went from classic cpdn to BOINC cpdn. So, some FACts: 1) In the beginning, when the research was all at the University of Oxford, and the people there were mostly exploring parameter space to see where models crashed, not all of the data was returned to the project. Some was retained on people's computers for possible return later if the model proved interesting. But there were so many tantrums from people saying that they never signed up to be a data store, (in spite of being told that keeping it wasn't compulsory), that when the project moved on to the next phase in 2006, data was no longer left on people's computers. Now if a model completes it's designed run length, it will delete files afterwards. But if computer problems crash it, then the clean up routine is never reached. Also, those that complete but won't stop running, leave remnants behind when they are aborted. In these cases, the user must manually delete the folders. 2) The Couple Ocean models build up a large number of small data files before these get zipped and returned. This can amount to a few gigs each. One person recently was missing the program that does the zipping and returning, so the files built up to over 10 Gigs. And there are now monster machines with 24 processors running models, so a huge amount of data will be normal for them. Backups: Here |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
OK, good. I was hoping they would adopt some sort of mandatory clean-up policy at some point, considering the limitations of SSDs. I just need to experimentally determine the maximum size when running four tasks simultaneously on an Ivy Bridge i5-3550. If a ramdisk will do it, that works for me. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
In my experience - the only thing that gets a new machine better productivity is a faster core. Faster disks, faster memory - gets zero more production. zero My ancient Core 2 Duo at 3.0 GHz is still getting 25920 seconds per trickle. (on Linux - see post on mathlibs) My new i7-3770 at 3.4 GHz - ignoring the "hyperthreading" will - when I choke it down to only 4 wu - ie one wu per real core it will be a little bit faster per core than the ancient Core2 Duo at 3Ghz I've tried faster memory, faster disks -- gets nothing. The core speed is all that matters for CPDN. Believe it. Or look at the fastest machines -- overclock - maybe- but overclock loses in even the short run |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
|
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Eric writes ...the only thing that gets a new machine better productivity is a faster core. .... Well yes and no. If you look at the 'Top Computers' list, it is lead by a computer running an AMD Opteron(tm) Processor 6176. Curious, as I know nothing about AMD, I looked it up, and if I am reading it correctly, it runs at a lowly 2.4GHz, but the rig is listed as having 48(!!!!) cores - no wonder it tops the list. Forgive me if I got this wrong, but it serves as an illustration to the following. OK, if you have a system and want to improve it without major changes, then yes a processor speed upgrade is an absolutely valid option. If on the other hand you are looking at a completely new system as I am, then a different approach is equally valid, and will give higher total throughput. As mentioned earlier, I took the approach of running as many tasks as possible on one computer within my budget and energy envelope, and as such, hopefully will have a higher throughput of tasks than if I had just concentrated on processor speed. Simple maths indicate this will be the case, if we assume the same percentage of cores run on each CPU. So instead of running e.g. 4 tasks on my i7, I will be able to run 8 on the Xeon (with hyperthreading.) My 'ancient' i7 runs at 2.67GHz, the new system with a Xeon E5-2670 will run at 2.6GHz. Therefore I should get double the number of tasks through to CPDN. If I had gone from say 2.67 to 3.1GHz on my i7 (assuming all else the same), I would only have gained a 16% increase. But even this could not have been achieved, as no-one in NZ stocks LGA1366 processors anymore. That's built in redundancy for you - grrrr. I actual fact I would expect more than double the throughput, as the proposed Xeon processor is several generations newer and hopefully more efficient (computionally) than my i7-920, which was one of the first in the i7 series. Time will tell I guess. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
What I was trying to say -- the performance per core hasn't gotten much faster lately. If you can get more cores at a reasonable price, and they do use less power per core recently -- that's good. --edit -- Looks like the rig you are planning on is real good on the reliability factor - that's the important part -- Don't waste money on faster memory or disks, however -- tried that, it doesn't help with cpdn workload. Go for reliability. Best luck - hope your new rig chomps the numbers. e |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
So there is speed, reliability, and affordability. Can't have all three. Posting this account of my dream machine -- when I have a few million to spare. The IBM z-series has 5GHz+ cores, MTBF in decades, can virtualize almost anything, super-redundancy, support included with the price. But the price per core is -- roughly - 2000 times the price of a reliable Xeon. Oh well. I'll go for cheap and reasonably reliable at 400-2000 USD per box, depending on speed and core count. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
An update for those that are interested. Quite a bit here if you manage to get to the end. Well after a long time I have a new Xeon workstation (ID 1285327), but the initial tasks have all crashed, the last two because I pulled the wrong plug out of the wall. Dumb, dumb, dumb! 2 others with code 22 definitely look to be model errors and the other two with 193 I can't determine. Hopefully we are past the crash stage and things will now go well. I seem to remember a list of all the main exit codes as a sticky somewhere in the forum, but for the life of me I can't seem to locate it, but sure I'd spotted it a few weeks ago. Can anyone help on that one? As to the build - this took over a month in itself and glad I didn�t do it. The builder went for the Gigabyte GA-X79S-UP5-WIFI as it really does tick all the boxes. HOWEVER it would never get passed 3 hours in a burn test before the power supply shut down. ALL components were changed and THREE motherboards tried, all with the same result. They finally gave up and went with an Asus MB. Digging around it is an issue with this motherboard and I CANNOT recommend it for use with CPDN type tasks. The final straw was when ASUS told them to start lowering the voltages on the MB to make it more stable, at which point I said no. Things like this should just work. For those that are interested, the final spec for the CPDN related parts of the build are:
CPU: Xeon E5-2680 Memory: 32GB ECC memory (Kingston ValueRam Server Premier Memory KVR16E11/8 (4X 32GB total Memory) HDD OS & programs: Intel 520 240GB SSD HDD CPDN/BOINC: Existing Seagate 500GB HDD. Nothing else run from this HDD. 6 other drives including another SSD for use as a scratch disk.
Prime95: 180 W OCCT: 210 W CPDN 4 tasks: 130 W CPDN 6 tasks: N/A (180 W Old PC) CPDN 8 tasks: 155 W
|
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Nice write-up, Martin. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Thanks for the write-up, Martin. I don't recall a list of error codes on this discussion board, but the Boinc FAQ service, http://boincfaq.mundayweb.com/index.php, has a section devoted to them (section 6). |
©2025 cpdn.org