Questions and Answers : Wish list : Work unit taking too long
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Mar 06 Posts: 1 Credit: 566,546 RAC: 0 |
You won\'t be able to get very many results if the work units take over 2,000 hours each! Right now I have a task with a report deadline of 7/19/2010 and 2768 hours left to complete! Am I the only one? Is this a fluke? Does everyone else see this? What kind of distributed computing can be done if it takes more than a year to work out one work unit? |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Currently there are three different types of model that you can download, they take different times and have different memory requirements. Obviously the CPU will have an effect on the speed of the model. For more information, see the \'README - running the model\' (link via my signature), the first couple of posts within the \'information\' section discuss the different types of model. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
If you look at the stats on the front page of this site, you\'ll see that LOTS of climate models of various types have been completed. If you look through the forum topics, you\'ll see that lots of people get a shock when they find out how long a climate model takes to create. The one \"deadline\" is just there because it\'s a builtin part of BOINC, and a number is compulsary. But this project ignores deadlines. And it doesn\'t actually take a year to complete a model. Unless one is not serious about the project, and is only allowing a small amount of time for the climate models, in favour of work units from other projects. Also, if you look at the climateprediction prefs on your account page, you\'ll find an option to select a type of model, some of which are a lot shorter, although one of the short models is also a high resolution model, and requires a lot of ram. If you don\'t tick a type, you get issued one at random. It takes me about 3 months for a long model on an Intel P4, and 12 days for a short model, on an Intel quad. 1 model just finished, and 7 more in about 2 days. Happy crunching. Backups: Here |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Ramgarden, welcome to the forum. The HADSM model on your AMD could take 3 or 4 weeks to complete depending on how many hours a day you let the computer crunch. As you\'ll see if you go to the posts Mike suggested, the 160-year HADCM on your Intel is the longest type of model. Yes, they are massive models and you could call this extreme distributed computing. I\'m running one on my Intel. By the time it completes it will have crunched for about 2560 hours which is a bit more than 3 months running 24/7. There\'s no need to run these models 24/7 if you don\'t want to. The researchers will be needing more of these models for quite some time, which is why they have a long deadline for completion. Even if you overrun the deadline it doesn\'t matter as the CPDN servers accept overdue models and still give you all the credits you\'ve earned. If you disable the model screensaver and instead occasionally view the globe using the View Graphics button in BOINC manager, the model will run faster. To keep a model running for so long without crashing requires a few precautions, mostly easy to implement - you just need to know about them. In the README about crashes and problems, I\'d recommend item #5 by Mike who posted above. It\'s a comprehensive overview of the precautions. Many of us regularly back up the contents of the BOINC folder so that if the model does crash, we can restore the backup and continue the model. In the README about backups, the first manual method explained by Les is quick and easy. If one of these HADCM models does crash and can\'t be restored, it will still have sent data useful to the researchers at the end of each model decade. And some crashed models are reissued and continued from part-way through on other computers. Hope that helps. Mo Cpdn news |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
It\'s even more difficult to complete work under these conditions: <message> Did you perchance sign on to this project without reading anything about what it entails? Please note the utterly meaningless credits to the left of my post. Perhaps I should say meaningless except to indicate that it might actually be possible to run these over-long Models. Tends to suggest that your \"You won\'t be able to get very many results if the work units take over 2,000 hours each!\" might be ill-advised, eh? You might also check the amount of work done by the tens of thousands of participants (your ID# will give you a clue as to how many). Part of it is shown in the \"Project Stats\" link in the blue, left. Your reaction is typical of those who jump into the water without testing the temperature. We hope that you will find the temperature is reasonable and that you can participate and gain the satisfaction that so many CPDN participants feel for doing an important job -- and seeing a long Model through to completion. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 21 Nov 06 Posts: 20 Credit: 318,377 RAC: 0 |
We have all our bad hair days. I\'ve not many credits going for this project because time and again the model goes and breaks off, be it at 100 hours or 1681 hours as one did. Make it robust and fix it so it will be able to restore without the hoopla stuff contributors have to go thru. Also fix the due date issue and set it to something that will cause BOINC not to go into a Earliest Deadline First when that \'fictitious\' deadline is approaching. 2 cents |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Sekerob The deadline date for all models currently being issued has been lengthened considerably, which should help multi-project crunchers. Backing up the BOINC folder regularly shouldn\'t be difficult even for newbies who can use Les\'s easy manual method which only takes a few minutes. It\'s the first method described in the README collection about backups (link in my sig). Restoring a backup after a model crash is almost always successful in my experience. Les\'s restore method, described click-by-click, is just as easy. As you\'re not a BOINC newbie you may prefer to try one of the more sophisticated backup methods! Cpdn news |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
I\'ve not many credits going for this project because time and again the model goes and breaks off Your computers are hidden so we can\'t see the failed Models and their error messages. If you make the machines visible, perhaps we can help you solve the problem(s). "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 21 Nov 06 Posts: 20 Credit: 318,377 RAC: 0 |
That\'s the way they are going to remain. I know what each case was caused by but the longest one that ran 1681 hours, but am fairly sure it was file corruption on a disk progressively going up the famous creek.I\'ve not many credits going for this project because time and again the model goes and breaks off In the backup / restore procedures there are a number of items I may have overlooked: 1. What if the unattended client already communicated back to CPDN that the model crashed? 2. How do you stop CPDN to communicate a \'crash\' message on a multiproject / multicore system? Suspend networking altogether? But then, 3. Running the projects dry takes time on a 4 core, particular if there is a few days buffered work and some projects like QMC running day-long models. Eventually the Result upload has to be done. How to be sure the bad CPDN project does not \'tell\' things went bad on the first internet connection? One thing I don\'t/won\'t engage in is meddling in the client_state.xml The wiki suffers from tunnel vision and is long behind on present crunching conditions. Partial quotation: #5 We don\'t want BOINC to contact other projects with information about runs that were in progress at the time of the backup but have since been completed, so: Is this a serious proposition, the present model 69 hours done and 330 hours to go on a multicore? Maybe CPDN should recommend to only participate with single core machines (ancient, slow, disproportional amount of electricity use). Don\'t want to be a PITA, but the solution could be more of the 21st century. cheers Coelum Non Animum Mutant, Qui Trans Mare Currunt |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... Doesn\'t matter - you do get a \'result refused\' message when the result reports for the second time, but this makes no difference. Similarly the original error report will stick on the website, but this isn\'t important (trickles are still received, credit still generated, and the scientific data still collected). 2. How do you stop CPDN to communicate a \'crash\' message on a multiproject / multicore system? Suspend networking altogether? But then, As long as you don\'t mind that the web site claims that the result has crashed out (which has no effect on subsequent processing) you needn\'t bother to try to block the crash message.
As you have noted, the wiki is very out of date now that Paul D Buck has stopped maintaining it. We\'re trying to maintain more up-to-date info on the forums instead. I don\'t know why the wiki says there is a problem with sending duplicate \'finished\' reports to other projects, I\'ve never tried that - is it a real problem? Won\'t the duplicate simply be rejected as \'already received\'?
Boinc is quite awkward when it comes to restoring single models from multicore systems. I\'ve raised this in /trac + bugzilla reports, but so far they\'ve been ignored. The easiest thing to do is just to restore everything together, although this results in some CPU time being wasted for the other climate models on the same host. RRodway wrote an automated backup system so that you can take daily backups without having to spend time at each host. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
My solution for a model crashing on a quad core is to just let it stay dead. There are millions more combinations to try. Most of the problems people have are caused by either operator ignorance, (e.g. not shutting down BOINC before \'pulling the plug\'), or hardware problems, possibly the most common being power supply unit being too small, as people try to use anything they can get their hands on to run science apps that are too \'big\' for the computer. As Mike said, the Wiki is \'dead\', Long Live the README files. :) |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Has anyone got a link to the Wiki page Sekerob referred to? (I\'ve never learned to navigate the unofficial BOINC Wiki and only find random pages by chance... and rarely find my way back to the same pages.) Maybe some of us should learn how to Wiki-edit or at least delete UBW stuff that\'s no longer advisable, or provide links to README posts? Mike said I\'ve raised this in /trac + bugzilla reports, but so far they\'ve been ignored. My comments on this Trac ticket (unrelated to the content of this thread) have also been ignored for months, though I think I asked for something that should matter to us all. When you make a backup of the BOINC folder contents, it does help if you haven\'t got 100+ tasks from other projects in progress or waiting to be crunched (yes, this is possible!), because if you restore the backup you need to know which tasks have already been completed and therefore must be aborted to avoid crunching them a second time. So backups are easier if you don\'t also crunch other projects that send large numbers of short workunits. This is why some people who have more than one computer reserve a particular computer/s to crunch only CPDN. Cpdn news |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
It\'s here: http://www.boinc-wiki.info/Main_Page I also find it impossible to navigate. There are 1,300 pages ... UBW? I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
UBW? Unofficial BOINC Wiki! At first I thought it might be a cricketing term meaning umbilicus before wicket. Cpdn news |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
|
©2025 cpdn.org