Message boards : Number crunching : New Work Announcements 2024
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Although you could get a larger case, I always use full towers. My machine is already a full tower. https://www.dell.com/support/manuals/en-us/precision-5820-workstation/precision_5820_om_pub/front-view?guid=guid-37c8fd9c-4ee2-4c39-89f9-061167ff006d&lang=en-us https://www.dell.com/support/manuals/en-us/precision-5820-workstation/precision_5820_om_pub/major-components-of-your-system?guid=guid-3f127ece-ad92-4fd6-bbbc-b6548ebd69c4&lang=en-us |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
My machine is already a full tower.Then I don't understand you not being able to fit a larger fan. I have a 6 inch (15cm) cube cooler in my Ryzen machines. Dual 150mm fans, almost silent. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,778,076 RAC: 63,852 |
Setting defaults to 1-2 and resetting all current preference initially is reasonable to me, but I really hope we can honor override afterwards. This solves the problem of people never reading forums, while allowing people paying attention to use more cores on bigger machines once they have app_config updated. One caveat is that the setting is global, so it would also negatively affect WAH and HadAM4 even though they don't face the same memory problem. Other than WCG, I haven't seen per-app max jobs settings. I suppose it won't be a trivial change on server side to implement that, but if we could that would be the best IMO. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
Copied from old thread from Glen. Any further news of these? I need to return a result to get rid of the spurious RAC figure from the correction that was done months ago (308,000 where my boxes are capable of 80,000 at best). |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
The priority is to get the replacement EAS25 batches out (for aborted 1002-1004) once the troublesome files have been corrected and tested. It's highly likely they will be using the new WaH2 app which has been in development & tested not to suffer from the excessive failures. It has already been added to the main site as v8.29 of the WAH2 Region Independent app (or wah2-ri for short). If you receive a workunit for wah2-ri 8.29, you're running the new app. George (aka geophi) noted in testing the new app is ~10% faster than the old one. It's been done this way so as not to interfere with currently running wah2 workunits. There is also a new linux version of wah2-ri which is currently in test (not on the main site yet). The OpenIFS batch is ready & has been tested, but I'm not happy with some of the failures coming from the monitor code (not the model). To be discussed. The HadAM4 I think is about ready, maybe needs bit more testing. So, alot of Windows & Linux work coming soon. I'll be able to confirm more next week after the usual Monday CPDN meeting. HTH --- CPDN Visiting Scientist |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
The priority is to get the replacement EAS25 batches out (for aborted 1002-1004) once the troublesome files have been corrected and tested. It's highly likely they will be using the new WaH2 app which has been in development & tested not to suffer from the excessive failures. It has already been added to the main site as v8.29 of the WAH2 Region Independent app (or wah2-ri for short). If you receive a workunit for wah2-ri 8.29, you're running the new app. George (aka geophi) noted in testing the new app is ~10% faster than the old one. Many thanks :-) |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512 |
I received two new tasks for EAS Batch 1006, running 8.29, this morning. One of the two NZ Batch 1005 tasks (running 8.24) I had running disappeared overnight. The other is still running. Not sure what happened to it or where to look to find out. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Not sure what happened to it or where to look to find out.You can look in your computer's task list from your home page on this website. All tasks for computer 1367467 Unfortunately, in this particular case, not much evidence has been preserved. |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
I see lots of suspends in tasks that failed quickly. Did you set "Suspend when computer is in use" and define "in use" as mouse and computer activity in last 0 minutes" making it stop and immediately resume? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Also make sure "keep task in memory when suspended" option is selected (or whatever it's called). This prevents the models from constantly restarting as a new process and reading from the start files. If this is not enabled it increases the chance of a failure. --- CPDN Visiting Scientist |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512 |
Thank you. I didn't know about this page. I see now how to navigate to it. I see many "Error[s] while computing." Are those errors manifested by my system over which I have some control, or errors within the model or data? |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512 |
Thank you, Glenn. That option was not checked. I updated it now. |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
Fortunately client was sending intermediate results so progress wasn't lost. |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512 |
Following are my Preferences. I am naive about how these preferences affect my processing of cpdn tasks. I am very open to suggestions to improve my configuration. My computer is no longer actively involved in much other work, so cpdn can rise to the top of my priority computing. Please advise changes/enhancements that you suggest I implement. When computer is in use 'In use' means mouse/keyboard input in last 3 minutes Suspend all computing Suspend GPU computing Use at most 75 % of the CPUs Use at most 50 % of CPU time Suspend when non-BOINC CPU usage is above 30 % Use at most 38 % of memory When computer is not in use Use at most Requires BOINC 7.20.3+ 75 % of the CPUs Use at most Requires BOINC 7.20.3+ 50 % of CPU time Suspend when non-BOINC CPU usage is above Requires BOINC 7.20.3+ 30 % Use at most 75 % of memory Suspend when no mouse/keyboard input in last --- minutes General Suspend when computer is on battery Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Leave non-GPU tasks in memory while suspended Store at least --- days of work Store up to an additional 0.25 days of work Compute only between --- Disk Use no more than 100 GB Leave at least 0.001 GB free Use no more than 50 % of total Page/swap file: use at most 75 % Network Limit download rate to --- KB/second Limit upload rate to --- KB/second Limit usage to --- MB every --- days Transfer files only between --- Skip data verification for image files Confirm before connecting to Internet Disconnect when done |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,992,465 RAC: 14,585 |
You could try setting "use computer time" to 100% for both when in use and not in use to reduce the number of suspends |
Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512 |
Thank you. I went ahead and did that. I also updated "Suspend when non-BOINC use exceeds ..." to 50%." |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Thank you. I didn't know about this page. I see now how to navigate to it. "Yes." :/ In an ideal world, BOINC tasks should be well behaved and not care if they're repeatedly suspended/resumed, reloaded from checkpoints, etc. It may hurt rate of progress on them, but they shouldn't crash or error out or generate different results from if they're run straight through. CPDN tasks tend to not be that well behaved (I believe a lot of them are simplified versions of supercomputing code), and generally don't like being suspended/resumed (large clusters tend to just run tasks until done). There is work ongoing to fix that (with what sounds like some good progress on that front!), but they're best started once and let to run until either they hit some impossible conditions (the planet has cooled so much the atmosphere is now liquified), or they crash (which shouldn't happen, but does, and is being improved over time). If you suspend/resume the computer (S3 sleep, typically), it doesn't bother the tasks, and I do that regularly. But try to avoid causing the actual tasks to have to stop computation and resume regularly. That just doesn't work well right now. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
CPDN likes you not to use hyperthreading, it doesn't speed things up much, and makes the tasks take longer as you do twice as many. So if you're only doing CPDN, you're best setting it to only use 50% CPUs. If you're doing other projects too, it gets more complicated, you need to use the app config file to tell Boinc CPDN tasks "use 2 cores". |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
#1006 6048 tasks 2024-02-12 ALL WAH2_ri East Asia 25km These are from the reworked code so please do say whether these behave themselves and are less prone to crashes as has been observed in testing. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
I mentioned some time ago that my travelling laptop crashed a test task with the old app, with a signal 11 at startup. That host is approaching 10% on wah2_eas25_h0k1_201012_24_1006_012259529_0. I also have a tiny, low power, Celeron box (about the size and shape of a portable CD library) - picked up to test a 64-bit BOINC error on some low power processors, now resolved (host 1548871). That one is also running a task successfully, but has only reached 3% over the same timescale. |
©2024 cpdn.org