Message boards : Number crunching : task run time confusion
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 May 07 Posts: 15 Credit: 5,190 RAC: 0 |
I do not understand what has been happening with this project\'s tasks. This computer is a Core 2 Duo machine, 2.5 gig CPU, 2 gigs DRAM. A couple of days ago, I saw that the current task had 260 hours to run. It was about 50% finished. This computer runs four tasks, 2 each alternating on each core. The run time is set at 60 minutes. The computer runs 132 hours a week, so 33 hours on each task each week. That means the task should have needed just under 8 weeks to finish. Now, this morning, I see that I have a task, over 400 hours to go, only a couple of percent finished. So, I do not understand what is happening, and I have detached from the project. If someone can explain to me what is happening, then I might re-attach. I think that this is an important project and I would like to be on it. But I have to know what is going on. Any help will be appreciated. >>RSM When i checked this morning, >>RSM |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Hi, Which model was it which was displaying the problem? It may be that it was a HadSM3 model which turned into an slow iceworld (in which case the best thing to do is to abort that particular model). Not all iceworlds are slow, ones which run at normal speed should be left to run. Alternatively, it could have been a HadCM3 model which was doing a day/month/year retry, in which case it\'ll either fix itself and continue running normally, or it might automatically abort itself if it decides that it\'s climate is unrecoverable. Note that detaching from the project will have destroyed the model(s) you were working on. I would recommend going though the \'README\' files which are linked from my signature to get more background information. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 21 May 07 Posts: 15 Credit: 5,190 RAC: 0 |
So, unfortunately I cannot tell you which model was running on this machine. But, I am running the project with a current task on another fast machine. The statistics on that task closely mirrored the task I killed, it was initiated at the same time. I will look at that project when I get home from work. If it is behaving properly, I will leave it alone. If it behaved like this one, then I will come back to you. Many thanks for your help. >>RSM >>RSM |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Mitrichr One of your previous models crashed with a 107 exit code. In the README about crashes and other problems it would be worth looking at item #5 by Mike. Lots of ideas and suggestions there. Cpdn news |
Send message Joined: 21 May 07 Posts: 15 Credit: 5,190 RAC: 0 |
Hi Mitrichr Hey, thanks, I will look at it. >>RSM >>RSM |
Send message Joined: 21 May 07 Posts: 15 Credit: 5,190 RAC: 0 |
I am not home yet to check my other computer; but here it is is a nutshell: I was running this project on two Core 2 Duo machines set at 60 minutes task run time. Since the tasks are very long, that means that this project had one quarter of my crunch time for a long time. I don\'t know if the project has problems; but they should not be visited on people volunteering computer time. There are other projects with no problems on which I could get something done. So, you can be sure that if the other machine exhibits the same behavior, I will be gone. Otherwise, you should be able to tell me that I finished a task very recently. >>RSM >>RSM |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The relevant word there is volunteer. ALL projects have problems from time to time. It\'s entirely up to people to decide which projects they run. AND to read the info on each project\'s web site to see what it\'s about. This project is well know to stress a computer far more than any of the other projects, becuase of the complex formulae used in climate models. This runs the floating point maths unit in a cpu at a constant high rate. Not all computers are suitable for this project, but there are lots of other simpler projects. As for when /if your models finish, you can see this yourself. Just go to the Results page on your account and look. And just in case you think that ALL of the climate models should run to the end without problems, they don\'t necessarily do this. Part of the project is to find sets of starting values that will cause a model to fail at an early stage. There\'s only one way to do this, and that\'s to try these values and see what happens. |
Send message Joined: 21 May 07 Posts: 15 Credit: 5,190 RAC: 0 |
The relevant word there is volunteer. Well, thanks, but no thanks. First, I have no problems running CPDN. I have finished tasks. The two machines on which I was running the project are Core 2 Duo\'s, 2.5 gig with 2 gigs of DDR2 DRAM. I left with 4000 plus points, what ever that means. Second, it is my opinion that when an organization, CPDN, comes to a market, and we are the market, the organization should have done all of its testing of it product model. I am not interested in driving my new Hupmobile out of the show room only to have the wheels fall off. At WCG the new Cancer project has had problems with fast processors, quads, Zeons, etc., and those guys just said, hey, let us know when you are ready for our power, meanwhile we will go elsewhere and get some work done for someone else. There are just not that many people into this activity, maybe a million. SETI claims about 500,000, and they are 50% of BOINC\'s activity. WCG claims 300,000. We are interested in getting work done. So, see ya. >>RSM >>RSM |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
CPDN models have all been carefully beta tested by experienced crunchers. The precautions required to complete models are detailed in the post I recommended earlier. The same precautions in fact apply to the workunits of all BOINC projects - they\'re just even more important here because the longer a task runs on a computer (however good it is) the more likely it is that one of the problems will occur. That\'s why most CPDN crunchers who successfully complete their models take frequent backups of the entire BOINC folder. If a model crashes it can then in most cases be restored and continued. I agree that because these models are so long, they require a special sort of dedication on the part of the cruncher. Cpdn news |
©2024 cpdn.org