Thread 'New Work Announcements 2024'

Author	Message
Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 70288 - Posted: 2 Feb 2024, 14:26:46 UTC - in response to Message 70272. Last modified: 2 Feb 2024, 14:29:54 UTC Although you could get a larger case, I always use full towers. My machine is already a full tower. https://www.dell.com/support/manuals/en-us/precision-5820-workstation/precision_5820_om_pub/front-view?guid=guid-37c8fd9c-4ee2-4c39-89f9-061167ff006d&lang=en-us https://www.dell.com/support/manuals/en-us/precision-5820-workstation/precision_5820_om_pub/major-components-of-your-system?guid=guid-3f127ece-ad92-4fd6-bbbc-b6548ebd69c4&lang=en-us ID: 70288 · Reply Quote

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 70290 - Posted: 2 Feb 2024, 14:41:52 UTC - in response to Message 70288. My machine is already a full tower. Then I don't understand you not being able to fit a larger fan. I have a 6 inch (15cm) cube cooler in my Ryzen machines. Dual 150mm fans, almost silent. ID: 70290 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,778,076 RAC: 63,852	Message 70293 - Posted: 2 Feb 2024, 18:17:57 UTC - in response to Message 70276. Setting defaults to 1-2 and resetting all current preference initially is reasonable to me, but I really hope we can honor override afterwards. This solves the problem of people never reading forums, while allowing people paying attention to use more cores on bigger machines once they have app_config updated. One caveat is that the setting is global, so it would also negatively affect WAH and HadAM4 even though they don't face the same memory problem. Other than WCG, I haven't seen per-app max jobs settings. I suppose it won't be a trivial change on server side to implement that, but if we could that would be the best IMO. ID: 70293 · Reply Quote

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 70345 - Posted: 9 Feb 2024, 12:43:14 UTC - in response to Message 70102. Copied from old thread from Glen. Forthcoming batches The following batches are planned for Jan (or early Feb). a/ Weather@Home (Windows)* NZ25 - New Zealand 25km grid, natural forcings. EAS25 - East Asia 25km grid, range of different forcings. b/ HadAM4 (Linux) N216 climatological runs producing high frequency northern-hemisphere output. c/ OpeniFS (Linux) Low resolution batch to look at variation of model results across different hardware *We'll also roll out updated versions of the apps for Weather@Home, HadAM4, & HadSM4 to fix issues with the models failing, particularly on restarts. Although we hope to get these out before the Weather@Home batches it may not happen due to time pressure from the projects funding these batches. Hoping some of these might come sooner rather than later but I have given up holding my breath! Any further news of these? I need to return a result to get rid of the spurious RAC figure from the correction that was done months ago (308,000 where my boxes are capable of 80,000 at best). ID: 70345 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70346 - Posted: 9 Feb 2024, 13:38:21 UTC - in response to Message 70345. The priority is to get the replacement EAS25 batches out (for aborted 1002-1004) once the troublesome files have been corrected and tested. It's highly likely they will be using the new WaH2 app which has been in development & tested not to suffer from the excessive failures. It has already been added to the main site as v8.29 of the WAH2 Region Independent app (or wah2-ri for short). If you receive a workunit for wah2-ri 8.29, you're running the new app. George (aka geophi) noted in testing the new app is ~10% faster than the old one. It's been done this way so as not to interfere with currently running wah2 workunits. There is also a new linux version of wah2-ri which is currently in test (not on the main site yet). The OpenIFS batch is ready & has been tested, but I'm not happy with some of the failures coming from the monitor code (not the model). To be discussed. The HadAM4 I think is about ready, maybe needs bit more testing. So, alot of Windows & Linux work coming soon. I'll be able to confirm more next week after the usual Monday CPDN meeting. HTH --- CPDN Visiting Scientist ID: 70346 · Reply Quote

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 70347 - Posted: 9 Feb 2024, 19:29:18 UTC - in response to Message 70346. The priority is to get the replacement EAS25 batches out (for aborted 1002-1004) once the troublesome files have been corrected and tested. It's highly likely they will be using the new WaH2 app which has been in development & tested not to suffer from the excessive failures. It has already been added to the main site as v8.29 of the WAH2 Region Independent app (or wah2-ri for short). If you receive a workunit for wah2-ri 8.29, you're running the new app. George (aka geophi) noted in testing the new app is ~10% faster than the old one. It's been done this way so as not to interfere with currently running wah2 workunits. There is also a new linux version of wah2-ri which is currently in test (not on the main site yet). The OpenIFS batch is ready & has been tested, but I'm not happy with some of the failures coming from the monitor code (not the model). To be discussed. The HadAM4 I think is about ready, maybe needs bit more testing. So, alot of Windows & Linux work coming soon. I'll be able to confirm more next week after the usual Monday CPDN meeting. HTH Many thanks :-) ID: 70347 · Reply Quote

David Berg Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512	Message 70348 - Posted: 12 Feb 2024, 19:16:44 UTC - in response to Message 70346. I received two new tasks for EAS Batch 1006, running 8.29, this morning. One of the two NZ Batch 1005 tasks (running 8.24) I had running disappeared overnight. The other is still running. Not sure what happened to it or where to look to find out. ID: 70348 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361	Message 70349 - Posted: 12 Feb 2024, 19:25:39 UTC - in response to Message 70348. Not sure what happened to it or where to look to find out. You can look in your computer's task list from your home page on this website. All tasks for computer 1367467 Unfortunately, in this particular case, not much evidence has been preserved. ID: 70349 · Reply Quote

kotenok2000 Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080	Message 70350 - Posted: 12 Feb 2024, 19:27:38 UTC Last modified: 12 Feb 2024, 19:31:30 UTC I see lots of suspends in tasks that failed quickly. Did you set "Suspend when computer is in use" and define "in use" as mouse and computer activity in last 0 minutes" making it stop and immediately resume? ID: 70350 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70351 - Posted: 12 Feb 2024, 21:38:04 UTC - in response to Message 70350. Also make sure "keep task in memory when suspended" option is selected (or whatever it's called). This prevents the models from constantly restarting as a new process and reading from the start files. If this is not enabled it increases the chance of a failure. --- CPDN Visiting Scientist ID: 70351 · Reply Quote

David Berg Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512	Message 70352 - Posted: 12 Feb 2024, 22:29:28 UTC - in response to Message 70349. Thank you. I didn't know about this page. I see now how to navigate to it. I see many "Error[s] while computing." Are those errors manifested by my system over which I have some control, or errors within the model or data? ID: 70352 · Reply Quote

David Berg Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512	Message 70353 - Posted: 12 Feb 2024, 22:32:12 UTC - in response to Message 70351. Thank you, Glenn. That option was not checked. I updated it now. ID: 70353 · Reply Quote

kotenok2000 Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080	Message 70354 - Posted: 12 Feb 2024, 22:33:48 UTC Fortunately client was sending intermediate results so progress wasn't lost. ID: 70354 · Reply Quote

David Berg Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512	Message 70355 - Posted: 12 Feb 2024, 22:39:22 UTC - in response to Message 70350. Following are my Preferences. I am naive about how these preferences affect my processing of cpdn tasks. I am very open to suggestions to improve my configuration. My computer is no longer actively involved in much other work, so cpdn can rise to the top of my priority computing. Please advise changes/enhancements that you suggest I implement. When computer is in use 'In use' means mouse/keyboard input in last 3 minutes Suspend all computing Suspend GPU computing Use at most 75 % of the CPUs Use at most 50 % of CPU time Suspend when non-BOINC CPU usage is above 30 % Use at most 38 % of memory When computer is not in use Use at most Requires BOINC 7.20.3+ 75 % of the CPUs Use at most Requires BOINC 7.20.3+ 50 % of CPU time Suspend when non-BOINC CPU usage is above Requires BOINC 7.20.3+ 30 % Use at most 75 % of memory Suspend when no mouse/keyboard input in last --- minutes General Suspend when computer is on battery Switch between tasks every 60 minutes Request tasks to checkpoint at most every 60 seconds Leave non-GPU tasks in memory while suspended Store at least --- days of work Store up to an additional 0.25 days of work Compute only between --- Disk Use no more than 100 GB Leave at least 0.001 GB free Use no more than 50 % of total Page/swap file: use at most 75 % Network Limit download rate to --- KB/second Limit upload rate to --- KB/second Limit usage to --- MB every --- days Transfer files only between --- Skip data verification for image files Confirm before connecting to Internet Disconnect when done ID: 70355 · Reply Quote

Alan K Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,992,465 RAC: 14,585	Message 70356 - Posted: 12 Feb 2024, 23:13:57 UTC - in response to Message 70355. You could try setting "use computer time" to 100% for both when in use and not in use to reduce the number of suspends ID: 70356 · Reply Quote

David Berg Send message Joined: 2 Jul 15 Posts: 21 Credit: 4,211,312 RAC: 1,512	Message 70357 - Posted: 12 Feb 2024, 23:20:41 UTC - in response to Message 70356. Thank you. I went ahead and did that. I also updated "Suspend when non-BOINC use exceeds ..." to 50%." ID: 70357 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 70358 - Posted: 12 Feb 2024, 23:43:31 UTC - in response to Message 70352. Last modified: 12 Feb 2024, 23:44:33 UTC Thank you. I didn't know about this page. I see now how to navigate to it. I see many "Error[s] while computing." Are those errors manifested by my system over which I have some control, or errors within the model or data? "Yes." :/ In an ideal world, BOINC tasks should be well behaved and not care if they're repeatedly suspended/resumed, reloaded from checkpoints, etc. It may hurt rate of progress on them, but they shouldn't crash or error out or generate different results from if they're run straight through. CPDN tasks tend to not be that well behaved (I believe a lot of them are simplified versions of supercomputing code), and generally don't like being suspended/resumed (large clusters tend to just run tasks until done). There is work ongoing to fix that (with what sounds like some good progress on that front!), but they're best started once and let to run until either they hit some impossible conditions (the planet has cooled so much the atmosphere is now liquified), or they crash (which shouldn't happen, but does, and is being improved over time). If you suspend/resume the computer (S3 sleep, typically), it doesn't bother the tasks, and I do that regularly. But try to avoid causing the actual tasks to have to stop computation and resume regularly. That just doesn't work well right now. ID: 70358 · Reply Quote

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 70359 - Posted: 13 Feb 2024, 0:22:46 UTC - in response to Message 70355. CPDN likes you not to use hyperthreading, it doesn't speed things up much, and makes the tasks take longer as you do twice as many. So if you're only doing CPDN, you're best setting it to only use 50% CPUs. If you're doing other projects too, it gets more complicated, you need to use the app config file to tell Boinc CPDN tasks "use 2 cores". ID: 70359 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762	Message 70360 - Posted: 13 Feb 2024, 6:53:54 UTC #1006 6048 tasks 2024-02-12 ALL WAH2_ri East Asia 25km These are from the reworked code so please do say whether these behave themselves and are less prone to crashes as has been observed in testing. ID: 70360 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361	Message 70361 - Posted: 13 Feb 2024, 9:35:19 UTC - in response to Message 70360. I mentioned some time ago that my travelling laptop crashed a test task with the old app, with a signal 11 at startup. That host is approaching 10% on wah2_eas25_h0k1_201012_24_1006_012259529_0. I also have a tiny, low power, Celeron box (about the size and shape of a portable CD library) - picked up to test a 64-bit BOINC error on some low power processors, now resolved (host 1548871). That one is also running a task successfully, but has only reached 3% over the same timescale. ID: 70361 · Reply Quote