Message boards : Number crunching : sulphur seems slower than slab
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
The problem with all this, is that the current and future phases of cpdn are all going to be large. Obviously there is important work that can only be done on faster, always-on boxes. I accept that and I accept that the boxes I can donate at present do not qualify for that. But surely the scientists can find something useful for the medium speed machinhes to do? And something for the fast-machines that are only on in business hours? (The 37-hr week corresponds to around an 78% reduction in the throughput of a box, so that a 2.8GHz box in an office is worth about the same to BOINC as a 700MHz box on 24/7)). If there really isn\'t sensible work for these categories then I and others will best serve the project by leaving once our slabs are done. But before we do, the scientists need to be sure that they won\'t think of something useful we could have done just after they have let us all go. River~~ |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I am very surprised. This is what I think they tried. The following happened: From boinc 4.4x and before, estimated time to completion was very good, but then 4.7x and above came along with its grossly inaccurate (usually vastly overestimated) time to completion. People with fast PCs who suddenly wanted to give more time to cpdn because of sulphur weren\'t given sulphur because they previously hadn\'t devoted a large enough percentage of time to cpdn. So the results of this attempt didn\'t work very well. |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
People with fast PCs who suddenly wanted to give more time to cpdn because of sulphur weren\'t given sulphur because they previously hadn\'t devoted a large enough percentage of time to cpdn. I think that was working OK - it was a case of user education being needed to ell people to change their resource shares, or to detach / re-attach at the end of their current run so the database forgot their past lower share. Insteac what has happened is even more unfriendly to snother group of users. What this change has actually done is to steal resource share where it is not offered. Four of my boxes had 51% resource share for CPDN. They would comfortably have finished slabs inside the deadline on that resource share. Instead they downloaded sulphur wu and went right into EDF mode, threatening to take 100% share for CPDN for many months on end. My response: to detach those four machines. In order to get sulphur onto the boxes that wanted it, the project has forced it onto boxes that don\'t want it, and some of which can\'t handle it comfortably. I\'d encorage the project team to look again at the \'failed\' attempt to offER A CHOICE. The results of that attempt actually could have been made to work with a little user education. I suggest that that code was right it was just a matter of showing users how to drive it. |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
Re: time share in old situation where you got a slab or a sulpher. I found out that not only the time-share was influencing the model push. As said earlier on that 2.4 G I got slabs, untill Sulphur became mandatory. Than it refused to give a Sulphur because the disk space reservation was not big enough. Was asked to change preferences. In the old situation, I did not get a message about the disk space, but just got an other slab. Here the log snippet: 2005-12-07 06:45:42 [---] May run out of work in 0.50 days; requesting more 2005-12-07 06:46:23 [---] request_reschedule_cpus: project op 2005-12-07 06:46:23 [---] schedule_cpus: must schedule 2005-12-07 06:46:29 [climateprediction.net] Requesting 31849.51 seconds of work 2005-12-07 06:46:29 [climateprediction.net] Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2005-12-07 06:46:31 [climateprediction.net] Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2005-12-07 06:46:31 [climateprediction.net] Message from server: No work sent 2005-12-07 06:46:31 [climateprediction.net] Message from server: (there was work but you don\'t have enough disk space allocated) 2005-12-07 06:46:31 [climateprediction.net] Message from server: Not enough disk space (only 466.2 MB free for BOINC). Review preferences for minimum disk free space allowed. 2005-12-07 06:46:31 [climateprediction.net] No work from project 2005-12-07 06:46:31 [climateprediction.net] Deferring communication with project for 1 days, 0 hours, 0 minutes, and 0 seconds |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Re: time share in old situation where you got a slab or a sulpher. thanks Kilcock, that is exactly my point. The appropriate fix, in my opinion, was to improve the messages to the user so that the user knew what changes to make in order to get sulphur, if that was what the user wanted. With the prefs you had set, sulphur was inappropriate because it would have used more than its allotted share of disk space. Now, in that situation, you\'d get no work at all from CPDN, no slab, no sulphur. You then look at the situation and decide if you want to tweak project share and/or disk space. User control. But if it is the time slice that is deficient, CPDN gives you a sulphur anyhow and forces you into EDF to make time for it. Then if you don\'t want to accept that, the only option really is to detach. I am saying that if the time slice is deficient we should get slab (if slab is still of any scientific benefit) or should get nothing (if slabs are now totally redundant). We then decide for ourselves whether to tweak our prefs or not. With a three fold increase in minimum time committment it is not a programming error if the BOINC standard software alerts users to the fact that they need to increase their resource shares to accomodate the new WU. In my opinion it is a pity that the software team chose to work-around what was in fact a very timely warning. I am glad you got what you wanted. I am profoundly glad that there are many people like you who do have sufficiently fast boxes to contribute, because I sincerely believe this project is good for all of us. I still feel, though, that an approach that allowed more user choice was within reach and would have been even better than the current situation. |
Send message Joined: 3 Sep 04 Posts: 268 Credit: 256,045 RAC: 0 |
But surely the scientists can find something useful for the medium speed machinhes to do? And something for the fast-machines that are only on in business hours? (The 37-hr week corresponds to around an 78% reduction in the throughput of a box, so that a 2.8GHz box in an office is worth about the same to BOINC as a 700MHz box on 24/7)). Same prob for me. My old P4 2Ghz is going to complete a slab model tomorrow and as I don\'t want it to crunch a sulphur Wu (too long), I wonder what it could do for CP. It would be a pity to detach it from CP, IMHO.(a very stable machine that has done a lot of alpha tests for CP: No other alpha test soon ?) Arnaud |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Same prob for me. My best suggestion is to set nomorework for now, attach to another project that you feel is next best to CPDN, and stay around for a little while to see if the project people do come up with a viable alternative. |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
Same prob for me. No, I think we are not helped by driving people away from CPDN Personally I consider a 2G not top of the range, but fast enough to handle a Sulphur. Estimate it at 2000/168=11.9 weeks running 24/7 and just within a year for 8/5 (off_hrs) Actually only have one machine faster than that, this 2.4G, but today lost 3.2% sulpher as the machine crashed due to harddisk errors and had to open a hardware support call. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
It was my understanding of the long term trend of the project was that they would be running earlier models and using the output of those to feed into new models and from there back into other models. Obviosuly this is a confused description, but, this over-all scheme really would support well the division of work. I basically have decent speed on all my machines and so would be a candidate for all of the longer running models (groan), thus allowing people with slower computers to run other models. Perhaps the models and the divisions need to be rethought. Slower machines could do the easier parts and we then finish up on other computers ... well, just random neuron firing ... |
Send message Joined: 3 Sep 04 Posts: 268 Credit: 256,045 RAC: 0 |
Well, I\'m not driven away from CP as I have one \"fast\" machine running 20hours/day on a spinup model. But my other machine (the old 2Ghz) just runs 2 or 3 hours/day and completed a slab in 5 months, so sulphur would take about 10/15 months, and I think it\'s a bit too long and probably not very usefull for the CPDN scientists. Arnaud |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
It was my understanding of the long term trend of the project was that they would be running earlier models and using the output of those to feed into new models and from there back into other models. Obviosuly this is a confused description, but, this over-all scheme really would support well the division of work. First part: Indeed recall something like that. Users have probably several completed slabs stored on disk. Those could in princible be used as the basis for the two sulphur phases. Re last part: Problem is going to be that you have to deal with a data transfer of about 320Mb per model, which is a bit much for modem users. See more in the first part or the ability to run a single model on multiple processors/ machines, but will admit that this future-music. |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Re last part: [passing phase 1 data around] Problem is going to be that you have to deal with a data transfer of about 320Mb per model, which is a bit much for modem users. and a bit much for people on a limited download broadband. The cheapest ADSL in the UK tend to be limited to as little as 1 Gb/month with hefty overrun charges, and 320Mb would be 1/3 of the way there. For those with an always on unmetered connection this would be a far better method. Rather than spending 2 weeks crunching the same phase 1 data, you could spend 2 weeks on a slow download at around 1 Mb/hour and you wouldn\'t even notice the imnpact on your connection. In the meantime you\'d be crunching another project or another CPDN wu. This once again means that a degree of user choice needs to be built in. Will you accept huge downloads yes/no; or better specify bandwidth limit for huge downloads (0 => no huge downloads allowed). It would be good *not* to just rely on the current bandwidth limits in preferences: I am happy for Predictor or Einstein to take all my bandwidth for a minute or so, but would not want 320Mb to come down at full speed. [edit] idea: is all the data needed at timestep 1 of the sulphur run? If not, would it be possible to download it in [pun] bits [/pun] at each trickle -maybe getting the data at trickle N that will be needed just after trickle N+1. 30Mb to get going and then 22 trickles of 15Mb feels a lot less intimidating than 320MB in a single lump. This might also be better for the project servers? Again, maybe too much for modem users and some metered ADSL, but very manageable for those who are unmetered & always-online The science is driving the programs to get longer and more complex. That is good. It does mean, however, that to get the best out of the diverse array of donated machines the project team will need to offer flexibility to donors. Machines vary in speed, donors generosity is sometimes limited by outside organizations and sometimes by issues like metered connections, and donors also vary amongst ourselves in how long a run we feel happy to commit to. A more general comment to the project team: Please don\'t take any of my posts on this topic as complaints: I appreciate the efforts you (programmers & scientists). Please take my comments as suggestions from a supporter. The more choices you programmers give us donors the more machines and the more Teraflops you will have for the scientists. And, in my opinion, that can only be good for the planet and thus good for all of us. River~~ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
quote [edit] idea: is all the data needed at timestep 1 of the sulphur run? If not, would it be possible to download it in [pun] bits [/pun] at each trickle -maybe getting the data at trickle N that will be needed just after trickle N+1. unquote ALL the models start with the initial parameters from the data set. The second round of iterations use the results from the previous iteration. What these will be is unknown at the start, and remain so all the way through. If they WERE known ahead of time, it wouldn\'t be necessary to run the program in the first place. Horses for courses. And, If your computer can\'t stand the heat, take it out of the kitchen. Or something like that. :) |
Send message Joined: 5 Aug 04 Posts: 426 Credit: 2,426,069 RAC: 0 |
Even though the output of previous models is used in future models it is not as simple as just linking the result file to the input file. It is more limiting the range of parameters that will be explored in the later projects. Unfortunatly all (or at least 3) of the phases must be done on the same computer or the results are unreliable. Phases 1 & 2 are baseline calculations, phases 3, 4, & 5 are experimental stages. The baseline must be established for the experimental work to be valid, doing this on different hosts is unreliable. An option that was considered was sending workunits with the baseline phases and one experimental phase, but was rejected due to the waste of CPU time involved (4 extra phases per model). If the baseline calculations could be saved, a later model run on the same host could possibly skip these phases. The big problem with this is that the baseline runs will be different for each parameter set. They may also change with the different types of models, I know slab and sulphur are different but not if coupled or future model types would all be different. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
A more general comment to the project team: May I repeat G~~ words here, Could/should have written them myself. May I add: As this dynamic project is evolving, I see many members delve deep into their wallet to buy the latest spec machines, but other members, for various viable reasons, does not. and should not be forced to or left the alterative of leaving the project. It is my expectation that the machines of those members will evolve over time, but in a more sustainable pace than is expected at present. |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
quote I was extending the idea that *if* the phase 1 results are the same for a large number of phase 3-4-5 of the sulphur runs, those results could be run once and then distributed, at a cost of a 320Mb download. Of course I understand that if the results from the sulphur phases were known we would not be calculating them. Keck\'s response makes my question irrelevant - at whatever stage the numbers are needed they must come from the same machine so the idea of transplanting the data is a no-no, whether it be in one lump or many.
Sorry, I agree with the point but don\'t find the way you put it at all helpful Les. Even tho there is a smiley after the comment it feels hostile -- which I\'m guessing is not what you intended. The old course was suitable for my horses, the new one as currently offered is not. What this thread is about is establishing *whether* the scientists & programmers can come up with a course that is suitable for my horse. So before I leave, what I am asking is a) are there plans to bring back a course that is suitable (will there be any more slabs or runs of similar length to the slabs) b) if not, could the project team be persuaded to introduce such plans, and are there any scientifically sensible ways the longer runs can be further subdivided If the answer to both is \"no\" then the best thing I can do (best for me and best for this project) is to detach. On this we are already agreed Les, If you look back you will see I have said so in more than one posting in this thread. But *only* once I am sure the scientists and programmers are saying a definite \"no\". And I have had a direct request in another thread from a project person *not* to detach, so I feel entitled to at least *explore* other possibilities before going. River~~ |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
Can an administator please explain me, why I have to lose all edited content of an modification I was doing ? Because of this: You can no longer edit this post. Posts can only be edited at most 60 minutes after they have been created. IS THERE A NEED FOR A RESTRICTION LIKE THAT ? with the result that I will be busy for an other hour or so ! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The one hour restriction showed up a couple of months ago when the software was upgraded to a newer, more secure version. The messages to the left of a typing window appeared then as well. And at first html code disappeared, both for new and old messages, but Carl fixed that problem. And I agree that it\'s a pain. My typing lately has become rather \'lisdexic\', and I now have to get it right quickly. And another thing: there is a bug which prevents you from editing a message, even immediately, if you get an html string wrong. (By not closing the various parts.) When you go back in, the text from the string on is split, and scattered over lots of lines. And there is NO \"Post reply\" button, or \"Add sig\" box. (Which is why it can\'t be edited.) It looks like some sort of over-run problem. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
River No offence intended. I\'m just patiently saying that the projects requirements are what they are, and volunteers have to live with that. CPDN is not open ended like SETI, etc. I think it may have been envisaged as having a 5 year life, but I\'m not even sure when that period started. It wasn\'t even clear that funding would be available for experiment 2 until about a third of the way into this year. But, (on the community forum, where most discussion takes place), one of the project people said that it had been obtained, along with funding for another persion for the team, and funding for a REALLY big, \'high resolution\', sub-project. Which may or may not be divided up into small bits as you would prefer. No other details have come to light, possibly because of the intense work on spinup, (which I\'m now running), getting ready for exp 2, and work on the BBC project. As you are using a large number of computers, one possiblility that would help you, is if it ever becomes possible to use cluster computing. This gets talked about now and then, but it may need a 64 bit os to work. Carl said that the idea was possible because of the super computer origins of the programs we are running. If you do decide to leave, keep checking back. Something may turn up. But finding where it is being talked about is the problem. |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
The one hour restriction showed up a couple of months ago when the software was upgraded to a newer, more secure version. Thanks for spending some time on this issue. Have cooled down a bit on the other bb forum. But still not in the mood to rewrite my piece, however thought it was good, not dislectic, but admit that I sometimes stuggle to produce english. |
©2024 cpdn.org