Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 65 · 66 · 67 · 68 · 69 · 70 · 71 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.This project could easily do ten or twenty times as much work if they'd just make some improvements.Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. The main problem with that is not owning the source code - they’re not allowed to make changes to most of it. |
Send message Joined: 17 Jan 09 Posts: 124 Credit: 2,040,263 RAC: 2,573 |
It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic. Pretty profound, but it rings with truth. Bill F |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
I assume the UK MetOffice owns the code. Or is it someone else?The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.This project could easily do ten or twenty times as much work if they'd just make some improvements.Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. The biggest problem I see is the CPU cache congestion problem. Running too many WUs on a computer slows it down to a snail's pace. I keep playing around trying to figure out the most CP work units I can run on a computer. I've tried disabling hyperthreading and that works better but I still can't run all CPUs because it still slows down. Besides if I can't run every CPU thread with CP then I'd like to support ARP etc. Right now as my older WUs complete I detach from CP and then reattach to sweep up the debris it leaves behind. Then I specify a max of two CPUs and under BOINC preferences use at most 33/36=92%. That leaves some headroom but it's still noticeably faster if I run only one CP WU. It's frustrating when I know I could be running 18 or more if not for the CPU Congestion Issue. Last time I suggested this someone said they'd have to rewrite a million lines of Fortran. I'm not a coder but I would think they'd only need to modify aspects of the code. https://www.ibm.com/docs/en/aix/7.2?topic=implementation-design-coding-effective-use-caches "Repackaging techniques can yield significant improvements without recoding..." https://hackernoon.com/programming-how-to-improve-application-performance-by-understanding-the-cpu-cache-levels-df0e87b70c90 This guy says his code ran 50x faster after optimizing for CPU cache usage. I've even seen a book dedicated to efficient CPU cache coding. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
What improvements do you have in mind?Nothing even comes close to fixing the CPU cache issue but a few upgrades could make this project a whole lot more user-friendly. I'd start by fixing the work delivery bugs. Several projects use the "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer. Also fix the perpetual 60-minute project backoff. It makes no sense how work is delivered, it's just feast or famine. I either go days or weeks getting no WUs on a particular computer, even though the Server Status page says there's work available and another computer is getting work, or I get a year's worth of work in one delivery and must abort almost all of it. I can't think of another BOINC project that behaves this way. 16946 climateprediction.net 11/2/2021 2:14:19 PM update requested by user 16950 climateprediction.net 11/2/2021 2:14:25 PM Sending scheduler request: Requested by user. 16951 climateprediction.net 11/2/2021 2:14:25 PM Not requesting tasks: don't need (CPU: ; NVIDIA GPU: ) 16952 climateprediction.net 11/2/2021 2:14:27 PM Scheduler request completed 16953 climateprediction.net 11/2/2021 2:14:27 PM Project requested delay of 3636 seconds "Don't need" is not true. I have one 921 WU running and would like to run another. If I do get lucky and I'm blessed with a second WU I'd switch to "No new work" and switch back after one completed. Then if it's at all possible make the checkpoints closer together. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
or I get a year's worth of work in one delivery and must abort almost all of it. I have never received close to even six months of work even when work cache set to maximum. "Preferences for this project" page to allow the BOINCer to specify how many WUs of which project they'd like to download and maintain on their computer.In the past, CPDN used to allow users to specify which types of task they could receive, N216, N144 etc though this was before those particular types of task made it onto the drawing board but you get what I mean. I and at least one or two of the other moderators would like this but we have been told it isn't going to be changed, at least in the short term. I assume, I have never had some of the scheduling problems you have because I only run projects other than CPDN when there is no work available here. Windows tasks all get snapped up within a couple of days of appearing or even less, so on that front the only way more work can be done is for more scientists who want to do the areas of research that is suited to that task type. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,053,847 RAC: 14,696 |
In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds". |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
I assume the UK MetOffice owns the code. Or is it someone else?The main problem with that is not owning the source code - they’re not allowed to make changes to most of it.Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.This project could easily do ten or twenty times as much work if they'd just make some improvements.Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. Yes, it’s the Met office, not. CPDN or Boinc or the researchers we are helping. The Met Office have no involvement in what we are doing and optimise their code to run on their mainframes. The licence we are using to run the code does not allow us to change it to suit our PCs |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And the researchers are well aware that these models take a long time to run. This "BOINC stuff" is only a small part of the research, more "a special treat", rather than "the main course(s)". |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
That does nothing. Mine is set to 10 minutes.Then if it's at all possible make the checkpoints closer together.In the computing preferences menu item in "Options" there is a box :-checkpoint at most every.... seconds". |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,723,209 RAC: 7,531 |
Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
or I get a year's worth of work in one delivery and must abort almost all of it.I have never received close to even six months of work even when work cache set to maximum. I've gotten a year's worth of work several times, most recently a couple of days ago. The main point is to specify the number of WUs to send. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
And the researchers are well aware that these models take a long time to run.And it really shows by how poorly they run a BONIC server. They're so lazy they don't even send out a Server Abort when they abandon a project. Last night I completed 7 N144 WUs and they called them Abandoned. That's shameless. That's about seven CPU months of work I could've done for a project that actually cares. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Are you saying it's BOINC's fault that Oxford did not send out a Server Abort signal when they abandoned the N144 project???Not only does this project have the worst performing work server in all BOINCdom it's so rude. I just turned in 7 N144 completed tasks and they were recorded as Abandoned.That'll be the fault of the server software supplied by BOINC, rather than anything CPDN has done. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
So I do I know that any of my work will actually be used??? How do I prevent wasting my time and money doing futile work??? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. Why would people want checkpoints closer together? If you have 8 Boinc tasks running and you could set the checkpoint interval to 8 minutes, you would be writing a checkpoint every minute on the average. How much load do you want to put on your disk system? I figure out how much I would want to re-run in case of problems. Since N216 tasks take me about a week, I would normally make the interval an hour or so. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
This option can only increase the time between checkpoints, not decrease it. The checkpoint interval is coded into the application, Boinc client can't force checkpoints to happen. In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
In the case of CPDN my systems checkpoint every 2 hours or so. If you don’t leave your system crunching 24/7 then you might well wish that to be a shorter period so that you loose less work each time you shut down. I did not think of people shutting their machines down often. Since I leave my machine up 24/7 except updates requiring reboots that I do every week or two. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
From an old memory, I think that the climate models checkpoint at the end of each model year. |
©2024 cpdn.org