climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 64 · 65 · 66 · 67 · 68 · 69 · 70 . . . 91 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64737 - Posted: 30 Oct 2021, 5:20:05 UTC - in response to Message 64736.  

But maybe my boinc-client is running a check to prevent my sending too-large .zip files to the upload server.

Yes, now you've got it. :)

The specific part of boinc-client that's a problem is where it says: <max_nbytes>150000000.000000</max_nbytes>

I don't know what that's intended for, but it needs to be changed.
I'm going to try it shortly, just for practice.

But hopefully Sarah will abort this batch. More emails after the weekend.
ID: 64737 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64740 - Posted: 31 Oct 2021, 17:55:11 UTC

Put this in your cc_config and it won't happen:
<max_file_xfers_per_project>1</max_file_xfers_per_project>
ID: 64740 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64741 - Posted: 31 Oct 2021, 18:53:18 UTC - in response to Message 64740.  

Put this in your cc_config and it won't happen:
<max_file_xfers_per_project>1</max_file_xfers_per_project>

Won't help in many cases and may make problems more likely in some, e.g. if two or more zips 1,2 or 3 are created for other tasks they would all go first before the task finishing meaning the task completes, performs the check, finds the oversized zip4 and causes the task to fail. I will stick to editing the maximum file size for zips in client_state.xml as that works reliably.
ID: 64741 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 64742 - Posted: 31 Oct 2021, 22:42:01 UTC - in response to Message 64740.  

Put this in your cc_config and it won't happen:
<max_file_xfers_per_project>1</max_file_xfers_per_project>


I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency.
ID: 64742 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64743 - Posted: 1 Nov 2021, 7:36:28 UTC - in response to Message 64742.  

I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency.


I presume that means you have some projects that have many short units so you get lots of small files. I can't imagine anyone running only CPDN gaining a lot from doing that but if wrong would quite like to understand the explanation.
ID: 64743 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 64744 - Posted: 1 Nov 2021, 12:12:01 UTC - in response to Message 64743.  

I had to edit mine from the default of 2 to a working value of 5 to improve the efficiency.


I presume that means you have some projects that have many short units so you get lots of small files. I can't imagine anyone running only CPDN gaining a lot from doing that but if wrong would quite like to understand the explanation.


I run Rosetta, WCG, TN-Grid, SI-Dock and Ralph as well as CDPN. The work varies over time with, for example, SI-Dock varying from 10 minutes to 6 hours per WU. The 10 minute jobs were causing problems when the project suffered a drought then sent down 30-40 WUs and all the running WUs ended at the same time.

I also restricted it to running a maximum of 16 at any one time.
ID: 64744 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64745 - Posted: 1 Nov 2021, 13:29:15 UTC

Getting back to batches 920/921:

Sarah has said, (in part):
I think the issue is if all the restarts bunch and send back together then the limit is exceeded

So, If the task finishes are staggered by Suspending, then it may be OK.
So I'm NOT going to fiddle with client_state now, just tighten my seat belt, and get ready for the bumpy section of road at the end of the journey. :)
ID: 64745 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64746 - Posted: 1 Nov 2021, 13:49:22 UTC - in response to Message 64745.  

In that case, I will continue to up the file size limits on the tasks I run.
ID: 64746 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,723,209
RAC: 7,531
Message 64747 - Posted: 1 Nov 2021, 14:17:18 UTC - in response to Message 64745.  

Getting back to batches 920/921:

Sarah has said, (in part):
[quote]I think the issue is if all the restarts bunch and send back together then the limit is exceeded
I think that's a bad judgement call.

There are many possible reasons for the final zip file to be delayed, and they will differ from country to country around the world. CPDN, of all projects, should be sensitive to global communication needs and difficulties. There are routing problems (leading to slow speeds); metered data allowances (perhaps more expensive at certain times of day); ditto for electricity supplies; and may more besides.

A simple configuration change in the workunit generators solves 99% of all that. Just set the maximum file size to at least 10% above - I'd prefer 50% above - the largest file any researcher can imagine creating. It's a trivial, zero-cost, change for the project to make, but annoying and demoralising for the volunteers affected.

I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I well remember two bits of advice that they drilled into me:
1) A number is useless unless the units are specified (watch out for those binary/decimal definitions of a megabyte).
2) Do every calculation twice. Once, to the highest precision the available hardware (a slide rule, in my day) is capable of. And again, on the back of an envelope or beer mat, to order-of-magnitude accuracy only. Catches those wayward decimal points.
ID: 64747 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64748 - Posted: 1 Nov 2021, 14:21:34 UTC

Sarah and I are talking about current batches, the cure for which is to do a project abort, and start again.
ID: 64748 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,723,209
RAC: 7,531
Message 64749 - Posted: 1 Nov 2021, 15:04:41 UTC - in response to Message 64748.  

Fair enough, though please let me finish the four I've got running at the moment. ;-)

I should have made it clear that - personally - I'd let already-issued tasks reach their natural conclusion. Most readers here will know how to handle them in their own particular circumstances by now, after all our discussions.

But then alter the workunit generator as I suggested, and send out any remaining or failed tasks with a better safety margin.
ID: 64749 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64750 - Posted: 1 Nov 2021, 15:34:38 UTC - in response to Message 64749.  

Fair enough, though please let me finish the four I've got running at the moment. ;-)


Yes: three of my four have already uploaded the ....2.zip files and are over half done. The other one is over 1/3 done. They are taking about 20 seconds each to upload.

Sun 31 Oct 2021 03:41:07 AM EDT | climateprediction.net | Started upload of hadam4h_h12y_200902_4_920_012116620_1_r2094132970_2.zip
Sun 31 Oct 2021 03:41:27 AM EDT | climateprediction.net | Finished upload of hadam4h_h12y_200902_4_920_012116620_1_r2094132970_2.zip
Sun 31 Oct 2021 06:04:34 AM EDT | climateprediction.net | Started upload of hadam4h_10x3_209602_4_921_012118509_0_r245662154_2.zip
Sun 31 Oct 2021 06:04:54 AM EDT | climateprediction.net | Finished upload of hadam4h_10x3_209602_4_921_012118509_0_r245662154_2.zip
Sun 31 Oct 2021 02:51:04 PM EDT | climateprediction.net | Started upload of hadam4h_11cx_209902_4_921_012119079_0_r1553206733_2.zip
Sun 31 Oct 2021 02:51:30 PM EDT | climateprediction.net | Finished upload of hadam4h_11cx_209902_4_921_012119079_0_r1553206733_2.zip

ID: 64750 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64751 - Posted: 1 Nov 2021, 15:35:42 UTC - in response to Message 64749.  
Last modified: 1 Nov 2021, 15:37:04 UTC

Fair enough, though please let me finish the four I've got running at the moment. ;-)


Yes: three of my four have already uploaded the ....2.zip files and are over half done. The other one is over 1/3 done.
SORRY it posted twice.
ID: 64751 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64752 - Posted: 1 Nov 2021, 16:15:00 UTC

I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland.


They have a new one on a newish campus now. I occasionally drive past it and cycle past it much more often.
ID: 64752 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64753 - Posted: 1 Nov 2021, 17:40:17 UTC - in response to Message 64752.  

I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland.


I went to a large engineering school in Cambridge, Massachusetts for a while. My friend went to another in California. It was called Cal-Tech (California Institute of Technology).
Where I went to got called No-Cal-Tech for a while, reminiscent of a no-sugar soft drink, No-Cal.. (Its full name was Massachusetts Institute of Technology. )

This link will tell you way more than you probably want to know about No-Cal.

https://culinarylore.com/drinks:what-was-the-first-diet-soda/
ID: 64753 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64754 - Posted: 1 Nov 2021, 18:41:23 UTC - in response to Message 64747.  

I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland. I well remember two bits of advice that they drilled into me:
1) A number is useless unless the units are specified (watch out for those binary/decimal definitions of a megabyte).
2) Do every calculation twice. Once, to the highest precision the available hardware (a slide rule, in my day) is capable of. And again, on the back of an envelope or beer mat, to order-of-magnitude accuracy only. Catches those wayward decimal points.
When I got my physics degree using my dad's slide rule we learned that the three things physicists do most are: add and subtract zero, multiply and divide by one, and call it something else. :-)
It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic.

This project could easily do ten or twenty times as much work if they'd just make some improvements.
ID: 64754 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64755 - Posted: 1 Nov 2021, 18:46:35 UTC
Last modified: 1 Nov 2021, 18:47:51 UTC

This project could easily do ten or twenty times as much work if they'd just make some improvements.


Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.

I was trained in Physics at 'the other place' - the Cavendish Laboratory in fenland.


The same convention is used by the Tavistock Clinic and The Institute of Group Analysis in referring to each other. (They are not that much further apart than the houses of commons and lords.)
ID: 64755 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64756 - Posted: 1 Nov 2021, 18:50:18 UTC - in response to Message 64755.  
Last modified: 1 Nov 2021, 18:52:53 UTC

This project could easily do ten or twenty times as much work if they'd just make some improvements.
Only if it had ten or twenty times as many researchers asking Oxford to send work out for them.
Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.
ID: 64756 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64757 - Posted: 1 Nov 2021, 20:47:09 UTC - in response to Message 64756.  

Ok. This project could easily do ten or twenty times as much work per unit time if they'd just make some improvements.


What improvements do you have in mind?

My main machine runs Linux, has a 16-core processor, of which I am willing to allocate 8 for Boinc work. And really, 8 of the 16 cores are hyperthreaded, so I should not count count those as full speed ones. I limit the boinc-client to eight because when I run more Boinc tasks, the machine will run too hat unless I run the fans faster, when they get annoyingly loud.

My other machine is smaller and slower but runs Windows 10. It ran some CPDN tasks about a year ago when I first set it up, but has not gotten any since then. So the only change that CPDN could make is to provide more Windows tasks. It has already been explained why that is not presently happening.

Now CPDN is the most important of my tasks so I have the priorities set up so as to run 50% CPDN, 25% WCG, 12.5% Rosetta, and 1% Universe. At least, that is my objective. If all of them run normally for a couple of weeks, that is about how it works out. As far as cores are concerned, I allow up to 4 CPDN to run at once, 5 WCG to run at once, 3 Rosetta to run at once, and one Universe to run.

The only change CPDN could make to increase my work contribution. would be to always have new tasks available, and it is pretty clear why they cannot always do that. I have a pretty large processor cache, but the sweet spot is probably around 4 (maybe only 3) processors.. I tried 5 for a week or slow, but then they slowed down, probably running out or cache.
ID: 64757 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64758 - Posted: 1 Nov 2021, 22:31:29 UTC - in response to Message 64754.  

It's amazing how much math my generation can do in our heads compared to kids today that need a calculator to do the most rudimentry arithmetic.

At my engineering school, order-of-magnitude calculations were emphasized, to catch the mistakes that people did with more precise methods.
Also, it gave you a greater physical feel for the subject matter. I think many political mistakes are made by people who have not the slightest idea of the magnitude of what they are talking about.
ID: 64758 · Report as offensive
Previous · 1 . . . 64 · 65 · 66 · 67 · 68 · 69 · 70 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org