Message boards : Number crunching : Boinc 4.36 dev version. dwnlded 3 cpdn, what to do
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Mar 05 Posts: 2 Credit: 2,094 RAC: 0 |
Hi, This is my first CPDN post. Usually hang out in the SETI one, as that is where I started with BOINC. But, I did have a problem occur with BOINC, and I want to know what to do from a CP standpoint. My system is a 3.6GHz Intel HT. It runs two threads at once, meaning two projects get crunched at once (sorry, if you already know this, bear with me, it will make sense why I stated this) I had been running boinc for a while with cpdn, then had problems, lost the local projects due to a boinc issue, reattached to cpdn, etc., Now, finally I was running normal for a while, then the scheduler bugs in 4.35 dev version downloaded a third cpdn. It was running one just fine, and then 4.35 bug downloaded a second, and was locked into running both, then for some reason, downloaded a third. Now, I won't get done with the third one for quite a few months. So my question is this, what to do? Do I suspend or abort the 3rd one? Who do I notify about this so that someone else can run that third one now? Don't want to be a hold up when it will be a long time before I get to it. I also don't want to get stuck in running it 3 or 4 mos from now, if someone else gets allocated to it in the meantime. Any suggestions from official CPers? |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Very odd things happening in the download area lately...and not just with BONIC 4.3x. Using 4.25, my Win PC downloaded an extra WU right after another one had started. It is single CPU with HT disabled. Using 4.19, my Linux PC downloaded extra WUs right at the beginning of a run. Currently a P4 crunching two WUs, with two in queue. You can try to abort the 2nd WU, but it may just download a new one? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Just a thought: what message(s) is being displayed around the time of the download? Is it the "may run out of work......" that normally occurs at the end of modelling? Les |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
> Just a thought: what message(s) is being displayed around the time of the > download? > Is it the "may run out of work......" that normally occurs at the end of > modelling? > I didn't see the Linux ones, but the Windows one this morning did give an "insufficient work, requesting more" message. |
Send message Joined: 29 Mar 05 Posts: 2 Credit: 2,094 RAC: 0 |
My windows version gave the same message: > I didn't see the Linux ones, but the Windows one this morning did give an > "insufficient work, requesting more" message. > |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
4.35 has the new scheduler and this is one of the known problems at the moment. If you are not going to be able to finish the model in time, abort it. Far better to kill it early so that it gets back into the pool. As I understand it, we are way early in the process so, a lot of models are expected to fail. I know I have nearly 80 models in my account, yet I have only finished a few; something like 26, successfully. There is a "new improved" CPDN in the works, but I don't know if that is going to be used to re-run the "interesting" work units or not. The other problem is that I think we only have Tolu as a developer here ... |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
So long as you have only one WU sitting in the queue (or one per thread for HT machines) then having it there will at least ensure that you are never out of work ;) With CPDN you are allowed a year to complete it and with so many model permutations it does not terribly matter if some are in limbo. Obviously, if you do find yourself with a couple of dozen in the queue (and the during BOINC development similar things have happened) then it makes sense to ditch most of them early. The present experiment is intended to continue until the money runs out. But there will be other experiments, such as the sulphur and coupled ocean ones, particularly for faster machines. It would be a shame to discard partly crunched WUs to switch to these, but it would seem perfectly reasonable to discard uncrunched ones at that stage. All in all, I see no reason to worry about these WUs now. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I finished my 1st 4.12 model 15 hours ago, and got a new one. I'm using BOINC 4.25, and haven't had the early download yet. Maybe when it reaches 33% in 12 days. The 2nd current model is also 4.12, and is well into phase 3; 84%, and 8 days to go. Another thought: has anyone with this problem been running BOINC greater than 4.25, and then drowngraded again? Perhaps something is getting left behind from the experimental versions. Les |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
I acquired two new WUs for the queue yesterday when I restored a backup and upgraded at the same time to 4.36. What seems to have happened is that BOINC then created two new computer IDs. Models allocated to, and being crunched by, the existing computer were allocated to one of the new IDs which is now being credited with the trickles. Meanwhile, the old ID was treated as being out of work and a new WU was allocated, and the third ID was also allocated a WU. The upshot is that the BOINC manager is showing two active WUs (it is running HT) and two in the queue. Which is fine. But the server is clearly muddled. I shall try merging later. |
Send message Joined: 9 Aug 04 Posts: 25 Credit: 4,756,979 RAC: 0 |
I've got one machine which dl'ed an extra WU which it won't start for maybe a month at the going rate. I know it'll finish in time, but doesn't CPDN assume you trashed it if it doesn't start trickling? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Rumor has it that it's about 6 weeks. However there are a few people who seem to be saving trickles until the model finishes. I think that this instruction was written back in the pre-BOINC days, and hasn't been ammended. All that will happen is that the parameter set will be re-issued, up to a max of 5 times. And if you end up crashing it, maybe the other person will have better luck. Les |
©2024 cpdn.org