Message boards : Number crunching : Run Length - a suggestion
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Some time back, MikeMarsUK said in a message in another thread Running the existing 160-year model is effectively just the same as running four sequential 40-year models - at 1960, 2000, and 2040 the model uploads a \'restart dump\' to the server, so you can abort the model after the upload if you need to. My suggestion is that the project consider implementing a \'run length\' option in the project specific prefs, based not on model years, but on run time on the users box. Rosetta have done this, and you can choose settings from 1hr to 1day runtime. When the app gets to a point where it makes scientific sense to stop, it checks the runtime so far, checks how many chunks it has done so far, and works out if it has time for one more within the user preferred run time. This works well on Rosetta, and a nice side effect is that you can change the run length or a task while it is running. Say it has got to 5hrs and you cut the length to 2hrs, remember to Update the project on the client, then next time the app gets to a suitable break point it will finish gracefully. There are clear warnings to participants - the software always runs one chunk so on a slower box the 1hr option very often overruns. Also, because it only ever stops at the end of a chunk, sometimes the run length falls quite a bit short of the user request. Sometimes it goes over (where for some reason the last chunk ran for longer than the average chunks before it). How the equivalent would work on this project is that you\'d offer a range of options 1month, 2mo, 3mo, 4mo, 6mo, 9mo, 12mo, 15mo, 20mo, 24mo, 30mo say. The user would select one of these. In the usual BOINC way, the user could have different settings for \"home\" and \"work\", etc. The app would be modified slightly. At each 40year boundary it would do the test to see if it had time to run another 40years of model time. If not it would terminate. The advantage of this scheme is that it is user friendly - the user sets what they want to happen. The settings automatically apply to the next kind of work you issue and you do not even have to work out how many chunks correspond to the user request because it is all done post hoc in the app. That is why it would make sense to have the range of run times extending beyond the current deadlines. Also the project has the capability of doing an SQL query on the database to find out what run lengths are popular and which are unpopular. This might be useful when deciding what length work units to produce next. On Rosetta, they have a thread of \"top ten reasons I crunch Rosetta\". The ability to choose run length comes in many people\'s top ten reasons. It is likely (in my opinion) to be even more popular here. River~~ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The models now running are of a different nature to the old slab and sulphur. One of the changes is that there is a big upload every 40 years of \'restart files\'. When the software is written and tested, these uploads will allow models to be started part way through a run. But not right now. On the front page: Completed HadCM3L Transient Runs 1,890 So, we\'re getting there, with or without people with unsuitable computers. And there have been a few in the last day or so who have had a good winge, and then left, both here and on the BBC forums. |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
The models now running are of a different nature to the old slab and sulphur. Quite so. In due course it will be possible to restart these models from the 40-year uploads, a server based checkpoint. What I am suggesting is the other half of that option: to allow a user to cleanly exit at one of those checkpoints. It is a very small modification, a few lines of code added immediately after the server checkpoints to test whether the model should go back into the next 40 year block. I am also suggesting that it could make sense to release the \"early retirement\" option even before the code for the part runs is ready. The advantages of this over simply suggesting people can abort are - it feels like the project is allowing the variable lengths, which as I say is very good for participant morale - less chance of silly errors (like aborting in year 39 or year 70) - the software makes a clean exit at the most appropriate place - less chance of aborting at the wrong place because a participant does not understand different models (eg aborting a sulphur at 41yrs is not the way to go) - future models may have natural break points at different places, and the method I\'m suggesting means that those break points will work seamlessly with the current ones, and will break at whichever of their own break points most closely mathes participants wishes - no wasted processing time as there is with the wait for year 41 and abort method
I don\'t think it is useful to describe feedback as \"wingeing\". Nor to refer to people wingeing in response to a post that does not have a complaint of any kind in it, just a suggestion about how to do something that your users have been asking for. Maybe some of those other threads can be described in those terms, but not every suggestion is a moan in disguise. I hope mine wasn\'t taken as such. Yes it is great that nearly 2k runs have completed, and it is great that the majority of CPDN crunchers are happy to run the full 160 years. Congratulations to them for their dedication. But if you can keep another 10% of crunch power by writing a dozen extra lines of code, if you can turn the departing disaffected into grateful ambassadors for CPDN, surely that is worth it. I\'d like to see this project do even better than it already is, and am passing on an idea that worked well in another project. I hope the project programmers will consider the idea on its own merits. R~~ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Sorry, bad wording I guess. I wasn\'t thinking about you, but about people like this. On the other hand, this person has changed his mind, and, for the moment, is staying. And somewhere there\'s a thread labelled: I\'m out of here. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
In my experience, when we get complaints it\'s almost always because of frustration that the models are crashing and rarely because the workunits take so long. We\'ve put in a lot of work making info about the precautions necessary for model survival readily available. If you don\'t take the precautions, you\'re almost as likely to crash a 40-year module as a 160-year model because most crashes occur in the first days or weeks of crunching. But as soon as you understand the precautions and know that your model is highly likely to complete, you may as well have the satisfaction of seeing how your complete model evolves climatically. Many members are as interested in the climate as in the crunching. The researchers in Oxford really do seem to prefer models completed on a single machine, but as soon as the 40 or 80 year chunks of incomplete models beginning at the restart dumps become available, I\'m sure that this will suit some members more. Let\'s face it, whichever way you cut it, crunching cpdn can be tough when you first start, but most members, once they\'ve learned the basics, are in it for the long haul. Cpdn news |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Sorry, bad wording I guess. Fair enough, Les. Combined with me being a little over-touchy too, I guess. R~~ |
Send message Joined: 16 Oct 06 Posts: 13 Credit: 8,437 RAC: 0 |
Thanks for your comments, but I\'m afraid you greatly underestimate the work involved in doing what you described. There\'s nothing regarding climate models that can be done in \"a dozen lines of code!\" :-) What is really needed, and would take weeks of work full-time (as if we have it); is a system for recovering and automatically processing uploaded start dumps, really checking that they are OK for resending. That would make full use of our 40-year workunits (which we\'re capable of now on the client side; but haven\'t put into practice other than beta tests). |
Send message Joined: 3 Mar 06 Posts: 96 Credit: 353,185 RAC: 0 |
Let\'s face it, whichever way you cut it, crunching cpdn can be tough when you first start, but most members, once they\'ve learned the basics, are in it for the long haul. I\'ve crunched most of the BOINC projects and my opinion CPDN is one of the easiest. I don\'t see a difficult learning curve or tough initial phase at all. Maybe I\'ve been lucky? Maybe I haven\'t been here long enough? (I\'m 70% into a CPDN unit and 2 model days into a BBC unit, that\'s all). I dunno, it seems like you attach, you get a work unit, you crunch. As long as you send trickles you get credits. Can it get any easier? Well, yes it can. Considering they have programmers on staff who know the inner secrets of the code, CPDN should provide a nice auto backup and restore proggie. It would be the least they could do for the good folks who donate so much to CPDN. With a really nice backup/restore proggie supported/updated by the project rather than stuff donated by crunchers who may not be here tomorrow, folks really should have NO problem crunching long CPDN models. Crunchers want to know their work is useful. Credits aren\'t enough. It\'s not enough to say \"if you get to the 40 year mark then we can can use that\". People want more, they want to do the full 160 years. Having a top quality backup/restore program would be very reassuring for a lot of people. Whenever I hear CPDN being discussed, one of the first 3 comments is usually \"what if the model or my computer crashes?\". Address that concern and you will get more crunchers. The beauty of CPDN is a direct result of the length of the models. Just 1 WU download and you\'re good for several months. I love it!! Instead of verifying every day that 40 odd results were received and validated at several projects and if not then why not, I read my BOINC Messages and see if a trickle-up happened. Takes 20 seconds instead of 20 minutes or more. If the CPDN servers went offline for 3 weeks my \'puters would still have CPDN work. Go for 200 year models if ya wanna, I would gladly crunch those and I know lots of other folks who would too. |
Send message Joined: 27 Jun 05 Posts: 74 Credit: 199,198 RAC: 0 |
Thanks for your comments, but I\'m afraid you greatly underestimate the work involved in doing what you described. There\'s nothing regarding climate models that can be done in \"a dozen lines of code!\" :-) I don\'t think I have made myself clear then. What I am suggesting would be a small number of lines of code would be to add something like the following immediately after the 40yr upload point: cpu_per_model_year = cpu_so_far / model_years_done ; cpu_for_next_chunk = 40 * cpu_per_model_year ; cpu_after_next_chunk = cpu_so_far + cpu_for_next_chunk ; if ( cpu_after_next_chunk > user_cpu_limit ) then quit where \"quit\" is a break jump return signal or setting of an appropriate status code as appropriate in order to make the program exit early instead of going back into the next 40 years. This encodes the user action MikeMars was suggedting, but allows it to happen safely and at exactly the right moment. No losing 39 model-years due to getting the timing wrong & no aborting after the work is done but before the upload is complete. The above was what I was saying would be a small amount of code. Deciding exactly where to place that code & testing it may well be harder. I was not intending to suggest this was the whole job. In addition of course there would be a need to add a cpu_target to the cpdn preferences, and a column in the db for it. I am under the impression that the code to pick it up already exists in the BOINC code, if not the code to pick up the project specific pref certainly exists at Rosetta. What is obviously a huge amount of work is the programming to then be able to pick up those part-complete runs and complete them on other hosts. But my impression from other threads is that you intend to do that anyway. So to make changes that allow user flexibility might be only a dozen lines of new code more than you plan to write anyway. R~~ |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Word, Dagorath. This describes very well how I feel about the project. And of course, being a CPDN noob, the first question I ever posted here was \"what happens if I crash a WU\" ;-) well, it recently happened out of the blue on my box on Einstein and HC due to something as stupid as an outdated sound driver. I was very relieved to hear that one could simply backup the files in the project folder and they would include all the information needed for going on with the model. For me, personally, making manual backups is no problem. I\'m used to backing up (or is it backuping? Sorry, I\'m not a native speaker...) about everything you can put on a webserver, plus part of my private/local data, too, so for me it is easy to fit into my daily routine. But of course I\'m a bit more into IT than most people are and I totally understand if people don\'t have the time/motivation/knowledge to do manual backups themselves and I think that should be no reason to let their work go to waste. So something like an auto-backup is a very good idea... |
Send message Joined: 3 Mar 06 Posts: 96 Credit: 353,185 RAC: 0 |
So to make changes that allow user flexibility might be only a dozen lines of new code more than you plan to write anyway. It seems like the number of lines of code required isn\'t the point. The point is the project scientists want each 160 year unit crunched on 1 CPU from start to finish. I suppose it has something to do with reducing the number of variables, differences in the way floating point ops are handled, etc. Convince the scientists that it\'s OK to split the 160 year units into 4 X 40 year units and then you might see that kind of user flexibility. @ Annika, I have always said \"backing up\" because it seemed like proper English for some reason. But I like the way \"backuping\" rolls off the tongue. Henceforth I shall be backuping rather than backingup :) |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
*lol* Okay then. Looks like there is at least some \"flexibility\" concerning the semantic aspect of the project rather than the technical and scientifical ;-) But btw... what bothers me more than overall runtime of a project (I don\'t mind if I crunch 20 short WUs or a single long one) is the question how much memory it will take later on. I have read that the models get bigger when they come closer to being finished. While I\'m not worried at all about HDD storage space (this PC being new, plus I store most of the less important data on an external HDD) it makes me wonder if it will take more and more memory, too, and how much that would be. Can anyone share some experiences about that? You know, BOINC always runs in the background, so I find it kinda annoying if my memory gets completely stuffed... |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
They don\'t take any more memory at the end as at the start. Memory will jump while graphics are being displayed, but will shrink again once the graphics window is shut (another good argument for not running the screensaver). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Thanks a lot. The current usage is something I can easily live with even when I\'m gaming or doing hardware-demanding work. And although I like watching the graphics occasionally I would never dream of letting them run by default or when I need the power- after 2 years of making do with an underpowered laptop I know how to use my resources efficiently ;-) |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
It\'s best to at least suspend it when playing games / doing demanding work - more to protect the model than anything else! :-) Use \'exit\' instead if you\'re doing something which will take a long time (like video encoding), or doing a backup, or disk defragmentation. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 13 Oct 06 Posts: 60 Credit: 7,893 RAC: 0 |
Certainly not when I\'m playing the games that use only 60 or 70 percent of my CPU ^^ what a waste of flops. From experience I know how stable my system is and this behaviour has so far caused only one crash affecting BOINC (in about a year). So I gained much more in terms of computing time than I lost. Don\'t worry, I backup regularly, at least after every trickle, so it\'s a calculated risk I\'m taking. Of course I exit before I backup or defrag ;-) or even run my antivirus. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Lots of members would love an automatic one-button backup facility, but we\'re not getting this. There is a choice of backup methods in the backup README here: http://bbc.cpdn.org/forum_forum.php?id=4 Annika, you need to back up the entire boinc folder and not just the climate model files. Cpdn news |
©2024 cpdn.org