Thread 'New work Discussion'

Author	Message
Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64737 - Posted: 30 Oct 2021, 5:20:05 UTC - in response to Message 64736. But maybe my boinc-client is running a check to prevent my sending too-large .zip files to the upload server. Yes, now you've got it. :) The specific part of boinc-client that's a problem is where it says: <max_nbytes>150000000.000000</max_nbytes> I don't know what that's intended for, but it needs to be changed. I'm going to try it shortly, just for practice. But hopefully Sarah will abort this batch. More emails after the weekend. ID: 64737 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64740 - Posted: 31 Oct 2021, 17:55:11 UTC Put this in your cc_config and it won't happen: <max_file_xfers_per_project>1</max_file_xfers_per_project> ID: 64740 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64741 - Posted: 31 Oct 2021, 18:53:18 UTC - in response to Message 64740. Put this in your cc_config and it won't happen: <max_file_xfers_per_project>1</max_file_xfers_per_project> Won't help in many cases and may make problems more likely in some, e.g. if two or more zips 1,2 or 3 are created for other tasks they would all go first before the task finishing meaning the task completes, performs the check, finds the oversized zip4 and causes the task to fail. I will stick to editing the maximum file size for zips in client_state.xml as that works reliably. ID: 64741 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 64742 - Posted: 31 Oct 2021, 22:42:01 UTC - in response to Message 64740. Put this in your cc_config and it won't happen: <max_file_xfers_per_project>1</max_file_xfers_per_project> I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency. ID: 64742 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64743 - Posted: 1 Nov 2021, 7:36:28 UTC - in response to Message 64742. I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency. I presume that means you have some projects that have many short units so you get lots of small files. I can't imagine anyone running only CPDN gaining a lot from doing that but if wrong would quite like to understand the explanation. ID: 64743 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 64744 - Posted: 1 Nov 2021, 12:12:01 UTC - in response to Message 64743. I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency. I presume that means you have some projects that have many short units so you get lots of small files. I can't imagine anyone running only CPDN gaining a lot from doing that but if wrong would quite like to understand the explanation. I run Rosetta, WCG, TN-Grid, SI-Dock and Ralph as well as CDPN. The work varies over time with, for example, SI-Dock varying from 10 minutes to 6 hours per WU. The 10 minute jobs were causing problems when the project suffered a drought then sent down 30-40 WUs and all the running WUs ended at the same time. I also restricted it to running a maximum of 16 at any one time. ID: 64744 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64745 - Posted: 1 Nov 2021, 13:29:15 UTC Getting back to batches 920/921: Sarah has said, (in part): I think the issue is if all the restarts bunch and send back together then the limit is exceeded So, If the task finishes are staggered by Suspending, then it may be OK. So I'm NOT going to fiddle with client_state now, just tighten my seat belt, and get ready for the bumpy section of road at the end of the journey. :) ID: 64745 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64746 - Posted: 1 Nov 2021, 13:49:22 UTC - in response to Message 64745. In that case, I will continue to up the file size limits on the tasks I run. ID: 64746 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,723,209 RAC: 7,531	Message 64747 - Posted: 1 Nov 2021, 14:17:18 UTC - in response to Message 64745. Getting back to batches 920/921: Sarah has said, (in part): [quote]I think the issue is if all the restarts bunch and send back together then the limit is exceeded I think that's a bad judgement call. There are many possible reasons for the final zip file to be delayed, and they will differ from country to country around the world. CPDN, of all projects, should be sensitive to global communication needs and difficulties. There are routing problems (leading to slow speeds); metered data allowances (perhaps more expensive at certain times of day); ditto for electricity supplies; and may more besides. A simple configuration change in the workunit generators solves 99% of all that. Just set the maximum file size to at least 10% above - I'd prefer 50% above - the largest file any researcher can imagine creating. It's a trivial, zero-cost, change for the project to make, but annoying and demoralising for the volunteers affected. I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I well remember two bits of advice that they drilled into me: 1) A number is useless unless the units are specified (watch out for those binary/decimal definitions of a megabyte). 2) Do every calculation twice. Once, to the highest precision the available hardware (a slide rule, in my day) is capable of. And again, on the back of an envelope or beer mat, to order-of-magnitude accuracy only. Catches those wayward decimal points. ID: 64747 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 64748 - Posted: 1 Nov 2021, 14:21:34 UTC Sarah and I are talking about current batches, the cure for which is to do a project abort, and start again. ID: 64748 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,723,209 RAC: 7,531	Message 64749 - Posted: 1 Nov 2021, 15:04:41 UTC - in response to Message 64748. Fair enough, though please let me finish the four I've got running at the moment. ;-) I should have made it clear that - personally - I'd let already-issued tasks reach their natural conclusion. Most readers here will know how to handle them in their own particular circumstances by now, after all our discussions. But then alter the workunit generator as I suggested, and send out any remaining or failed tasks with a better safety margin. ID: 64749 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64750 - Posted: 1 Nov 2021, 15:34:38 UTC - in response to Message 64749. Fair enough, though please let me finish the four I've got running at the moment. ;-) Yes: three of my four have already uploaded the ....2.zip files and are over half done. The other one is over 1/3 done. They are taking about 20 seconds each to upload. Sun 31 Oct 2021 03:41:07 AM EDT \| climateprediction.net \| Started upload of hadam4h_h12y_200902_4_920_012116620_1_r2094132970_2.zip Sun 31 Oct 2021 03:41:27 AM EDT \| climateprediction.net \| Finished upload of hadam4h_h12y_200902_4_920_012116620_1_r2094132970_2.zip Sun 31 Oct 2021 06:04:34 AM EDT \| climateprediction.net \| Started upload of hadam4h_10x3_209602_4_921_012118509_0_r245662154_2.zip Sun 31 Oct 2021 06:04:54 AM EDT \| climateprediction.net \| Finished upload of hadam4h_10x3_209602_4_921_012118509_0_r245662154_2.zip Sun 31 Oct 2021 02:51:04 PM EDT \| climateprediction.net \| Started upload of hadam4h_11cx_209902_4_921_012119079_0_r1553206733_2.zip Sun 31 Oct 2021 02:51:30 PM EDT \| climateprediction.net \| Finished upload of hadam4h_11cx_209902_4_921_012119079_0_r1553206733_2.zip ID: 64750 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64751 - Posted: 1 Nov 2021, 15:35:42 UTC - in response to Message 64749. Last modified: 1 Nov 2021, 15:37:04 UTC Fair enough, though please let me finish the four I've got running at the moment. ;-) Yes: three of my four have already uploaded the ....2.zip files and are over half done. The other one is over 1/3 done. SORRY it posted twice. ID: 64751 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64752 - Posted: 1 Nov 2021, 16:15:00 UTC I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. They have a new one on a newish campus now. I occasionally drive past it and cycle past it much more often. ID: 64752 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64753 - Posted: 1 Nov 2021, 17:40:17 UTC - in response to Message 64752. I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I went to a large engineering school in Cambridge, Massachusetts for a while. My friend went to another in California. It was called Cal-Tech (California Institute of Technology). Where I went to got called No-Cal-Tech for a while, reminiscent of a no-sugar soft drink, No-Cal.. (Its full name was Massachusetts Institute of Technology. ) This link will tell you way more than you probably want to know about No-Cal. https://culinarylore.com/drinks:what-was-the-first-diet-soda/ ID: 64753 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64754 - Posted: 1 Nov 2021, 18:41:23 UTC - in response to Message 64747. I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I well remember two bits of advice that they drilled into me: 1) A number is useless unless the units are specified (watch out for those binary/decimal definitions of a megabyte). 2) Do every calculation twice. Once, to the highest precision the available hardware (a slide rule, in my day) is capable of. And again, on the back of an envelope or beer mat, to order-of-magnitude accuracy only. Catches those wayward decimal points. When I got my physics degree using my dad's slide rule we learned that the three things physicists do most are: add and subtract zero, multiply and divide by one, and call it something else. :-) It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic. This project could easily do ten or twenty times as much work if they'd just make some improvements. ID: 64754 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944	Message 64755 - Posted: 1 Nov 2021, 18:46:35 UTC Last modified: 1 Nov 2021, 18:47:51 UTC This project could easily do ten or twenty times as much work if they'd just make some improvements. Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. The same convention is used by the Tavistock Clinic and The Institute of Group Analysis in referring to each other. (They are not that much further apart than the houses of commons and lords.) ID: 64755 ·

Aurum Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318	Message 64756 - Posted: 1 Nov 2021, 18:50:18 UTC - in response to Message 64755. Last modified: 1 Nov 2021, 18:52:53 UTC This project could easily do ten or twenty times as much work if they'd just make some improvements. Only if it had ten or twenty times as many researchers asking Oxford to send work out for them. Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements. ID: 64756 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 64757 - Posted: 1 Nov 2021, 20:47:09 UTC - in response to Message 64756. Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements. What improvements do you have in mind? My main machine runs Linux, has a 16-core processor, of which I am willing to allocate 8 for Boinc work. And really, 8 of the 16 cores are hyperthreaded, so I should not count count those as full speed ones. I limit the boinc-client to eight because when I run more Boinc tasks, the machine will run too hat unless I run the fans faster, when they get annoyingly loud. My other machine is smaller and slower but runs Windows 10. It ran some CPDN tasks about a year ago when I first set it up, but has not gotten any since then. So the only change that CPDN could make is to provide more Windows tasks. It has already been explained why that is not presently happening. Now CPDN is the most important of my tasks so I have the priorities set up so as to run 50% CPDN, 25% WCG, 12.5% Rosetta, and 1% Universe. At least, that is my objective. If all of them run normally for a couple of weeks, that is about how it works out. As far as cores are concerned, I allow up to 4 CPDN to run at once, 5 WCG to run at once, 3 Rosetta to run at once, and one Universe to run. The only change CPDN could make to increase my work contribution. would be to always have new tasks available, and it is pretty clear why they cannot always do that. I have a pretty large processor cache, but the sweet spot is probably around 4 (maybe only 3) processors.. I tried 5 for a week or slow, but then they slowed down, probably running out or cache. ID: 64757 ·

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 64758 - Posted: 1 Nov 2021, 22:31:29 UTC - in response to Message 64754. It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic. At my engineering school, order-of-magnitude calculations were emphasized, to catch the mistakes that people did with more precise methods. Also, it gave you a greater physical feel for the subject matter. I think many political mistakes are made by people who have not the slightest idea of the magnitude of what they are talking about. ID: 64758 ·