climateprediction.net (CPDN) home page
Thread 'Boinc 4.36 dev version. dwnlded 3 cpdn, what to do'

Thread 'Boinc 4.36 dev version. dwnlded 3 cpdn, what to do'

Message boards : Number crunching : Boinc 4.36 dev version. dwnlded 3 cpdn, what to do
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user67716

Send message
Joined: 29 Mar 05
Posts: 2
Credit: 2,094
RAC: 0
Message 12350 - Posted: 6 May 2005, 18:38:56 UTC

Hi,
This is my first CPDN post.
Usually hang out in the SETI one, as that is where I started with BOINC.
But, I did have a problem occur with BOINC, and I want to know what to do from a CP standpoint.
My system is a 3.6GHz Intel HT.
It runs two threads at once, meaning two projects get crunched at once (sorry, if you already know this, bear with me, it will make sense why I stated this)
I had been running boinc for a while with cpdn, then had problems, lost the local projects due to a boinc issue, reattached to cpdn, etc.,
Now, finally I was running normal for a while, then the scheduler bugs in 4.35 dev version downloaded a third cpdn.
It was running one just fine, and then 4.35 bug downloaded a second, and was locked into running both, then for some reason, downloaded a third.
Now, I won't get done with the third one for quite a few months.
So my question is this, what to do?
Do I suspend or abort the 3rd one?
Who do I notify about this so that someone else can run that third one now? Don't want to be a hold up when it will be a long time before I get to it.
I also don't want to get stuck in running it 3 or 4 mos from now, if someone else gets allocated to it in the meantime.

Any suggestions from official CPers?


ID: 12350 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 12352 - Posted: 6 May 2005, 19:12:44 UTC

Very odd things happening in the download area lately...and not just with BONIC 4.3x.

Using 4.25, my Win PC downloaded an extra WU right after another one had started. It is single CPU with HT disabled.

Using 4.19, my Linux PC downloaded extra WUs right at the beginning of a run. Currently a P4 crunching two WUs, with two in queue.

You can try to abort the 2nd WU, but it may just download a new one?
ID: 12352 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12354 - Posted: 6 May 2005, 19:42:43 UTC

Just a thought: what message(s) is being displayed around the time of the download?
Is it the "may run out of work......" that normally occurs at the end of modelling?

Les
ID: 12354 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 12357 - Posted: 6 May 2005, 20:04:49 UTC - in response to Message 12354.  

> Just a thought: what message(s) is being displayed around the time of the
> download?
> Is it the "may run out of work......" that normally occurs at the end of
> modelling?
>
I didn't see the Linux ones, but the Windows one this morning did give an "insufficient work, requesting more" message.
ID: 12357 · Report as offensive     Reply Quote
old_user67716

Send message
Joined: 29 Mar 05
Posts: 2
Credit: 2,094
RAC: 0
Message 12358 - Posted: 6 May 2005, 20:07:29 UTC - in response to Message 12357.  

My windows version gave the same message:

> I didn't see the Linux ones, but the Windows one this morning did give an
> "insufficient work, requesting more" message.
>
ID: 12358 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 12359 - Posted: 6 May 2005, 20:08:39 UTC

4.35 has the new scheduler and this is one of the known problems at the moment.

If you are not going to be able to finish the model in time, abort it. Far better to kill it early so that it gets back into the pool.

As I understand it, we are way early in the process so, a lot of models are expected to fail. I know I have nearly 80 models in my account, yet I have only finished a few; something like 26, successfully.

There is a "new improved" CPDN in the works, but I don't know if that is going to be used to re-run the "interesting" work units or not.

The other problem is that I think we only have Tolu as a developer here ...
ID: 12359 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 12364 - Posted: 6 May 2005, 22:00:16 UTC

So long as you have only one WU sitting in the queue (or one per thread for HT machines) then having it there will at least ensure that you are never out of work ;) With CPDN you are allowed a year to complete it and with so many model permutations it does not terribly matter if some are in limbo.

Obviously, if you do find yourself with a couple of dozen in the queue (and the during BOINC development similar things have happened) then it makes sense to ditch most of them early.

The present experiment is intended to continue until the money runs out. But there will be other experiments, such as the sulphur and coupled ocean ones, particularly for faster machines. It would be a shame to discard partly crunched WUs to switch to these, but it would seem perfectly reasonable to discard uncrunched ones at that stage.

All in all, I see no reason to worry about these WUs now.
ID: 12364 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12367 - Posted: 6 May 2005, 22:49:36 UTC

I finished my 1st 4.12 model 15 hours ago, and got a new one. I'm using BOINC 4.25, and haven't had the early download yet. Maybe when it reaches 33% in 12 days.
The 2nd current model is also 4.12, and is well into phase 3; 84%, and 8 days to go.

Another thought: has anyone with this problem been running BOINC greater than 4.25, and then drowngraded again? Perhaps something is getting left behind from the experimental versions.

Les

ID: 12367 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 12376 - Posted: 7 May 2005, 11:43:09 UTC

I acquired two new WUs for the queue yesterday when I restored a backup and upgraded at the same time to 4.36. What seems to have happened is that BOINC then created two new computer IDs. Models allocated to, and being crunched by, the existing computer were allocated to one of the new IDs which is now being credited with the trickles. Meanwhile, the old ID was treated as being out of work and a new WU was allocated, and the third ID was also allocated a WU.

The upshot is that the BOINC manager is showing two active WUs (it is running HT) and two in the queue. Which is fine. But the server is clearly muddled. I shall try merging later.
ID: 12376 · Report as offensive     Reply Quote
Profileold_user733
Avatar

Send message
Joined: 9 Aug 04
Posts: 25
Credit: 4,756,979
RAC: 0
Message 12519 - Posted: 11 May 2005, 23:09:33 UTC

I've got one machine which dl'ed an extra WU which it won't start for maybe a month at the going rate. I know it'll finish in time, but doesn't CPDN assume you trashed it if it doesn't start trickling?


ID: 12519 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12520 - Posted: 11 May 2005, 23:22:37 UTC

Rumor has it that it's about 6 weeks.
However there are a few people who seem to be saving trickles until the model finishes. I think that this instruction was written back in the pre-BOINC days, and hasn't been ammended.
All that will happen is that the parameter set will be re-issued, up to a max of 5 times. And if you end up crashing it, maybe the other person will have better luck.

Les



ID: 12520 · Report as offensive     Reply Quote

Message boards : Number crunching : Boinc 4.36 dev version. dwnlded 3 cpdn, what to do

©2024 cpdn.org