climateprediction.net (CPDN) home page
Thread 'Schedulers down too now..?'

Thread 'Schedulers down too now..?'

Message boards : Number crunching : Schedulers down too now..?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 6588 - Posted: 6 Dec 2004, 8:01:10 UTC
Last modified: 6 Dec 2004, 8:03:21 UTC

"Master File Fetch Failed" for all of my machines right now - looks like the CP scheduler went down at about 6:00am. I've disabled network access on all my machines for the moment, so that they don't back off too far...

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>

<a href="http://www.nmvs.dsl.pipex.com/">Distributed Mania</a>
ID: 6588 · Report as offensive     Reply Quote
Profileold_user949
Avatar

Send message
Joined: 20 Aug 04
Posts: 10
Credit: 132,163
RAC: 0
Message 6591 - Posted: 6 Dec 2004, 8:43:28 UTC - in response to Message 6588.  

Same here ...

&gt; "Master File Fetch Failed" for all of my machines right now - looks like the
&gt; CP scheduler went down at about 6:00am. I've disabled network access on all my
&gt; machines for the moment, so that they don't back off too far...
ALL GLORY TO THE HYPNOTOAD!
Potrebujete pomoc?
My Stats
ID: 6591 · Report as offensive     Reply Quote
old_user760
Avatar

Send message
Joined: 10 Aug 04
Posts: 94
Credit: 309,849
RAC: 0
Message 6592 - Posted: 6 Dec 2004, 8:50:47 UTC

Ditto, just in time for a ICE BALL to jam up Darwin. Lovely.
<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=35&amp;trans=off">
ID: 6592 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 6608 - Posted: 6 Dec 2004, 13:17:41 UTC

My machines are not always lucky on 1st try with scheduler but second try will eventually go through....
ID: 6608 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 6613 - Posted: 6 Dec 2004, 14:43:50 UTC - in response to Message 6588.  

The schedulers seem to be up now, new users will want to attach directly to:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc

(instead of the usual climateprediction.net of course)

ID: 6613 · Report as offensive     Reply Quote
Profileold_user17525

Send message
Joined: 13 Sep 04
Posts: 161
Credit: 284,548
RAC: 0
Message 6615 - Posted: 6 Dec 2004, 15:06:18 UTC - in response to Message 6613.  

&gt; The schedulers seem to be up now, new users will want to attach directly to:
&gt;
&gt; http://climateapps2.oucs.ox.ac.uk/cpdnboinc
&gt;
&gt; (instead of the usual climateprediction.net of course)
&gt;
&gt;
&gt;
I've now got ..master file parse failed....could not contact any schedulers.... and communication deferred for 14+ HOURS.

..... only now its gone up to 1 DAY 17 HRS+

Marj :(((
_________________________________
ID: 6615 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 6617 - Posted: 6 Dec 2004, 15:15:24 UTC - in response to Message 6615.  

Checked with three machines there - everything's went fine on first try...


&gt; I've now got ..master file parse failed....could not contact any
&gt; schedulers.... and communication deferred for 14+ HOURS.
&gt;
&gt; ..... only now its gone up to 1 DAY 17 HRS+
&gt;
&gt; Marj :(((
&gt;
ID: 6617 · Report as offensive     Reply Quote
Profileold_user17525

Send message
Joined: 13 Sep 04
Posts: 161
Credit: 284,548
RAC: 0
Message 6618 - Posted: 6 Dec 2004, 16:00:13 UTC - in response to Message 6617.  
Last modified: 6 Dec 2004, 16:02:35 UTC

&gt; Checked with three machines there - everything's went fine on first try...
&gt;
&gt;
I daren't check again it might go up even more!!!


_________________________________
ID: 6618 · Report as offensive     Reply Quote
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 6624 - Posted: 6 Dec 2004, 18:26:16 UTC
Last modified: 6 Dec 2004, 18:48:24 UTC

I think there's some sort of router problem on 'Janet' - I see 'address unreachable' with a ping plotter traceroute &amp; '100% loss' for a ping. Most of my machines managed to get through, including one final result upload, but two still cannot.

I'm now seeing "Master File Parse Failed" on all machines...

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>

<a href="http://www.nmvs.dsl.pipex.com/">Distributed Mania</a>
ID: 6624 · Report as offensive     Reply Quote
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 6633 - Posted: 6 Dec 2004, 19:37:36 UTC - in response to Message 6618.  

&gt; &gt; Checked with three machines there - everything's went fine on first
&gt; try...
&gt; &gt;
&gt; I daren't check again it might go up even more!!!
&gt;

Don't worry Marj, you can still force a manual update whenever they come back online properley &amp; that resets the delay back to one minute. I've disabled all my machine's network access again for now though.

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>

<a href="http://www.nmvs.dsl.pipex.com/">Distributed Mania</a>
ID: 6633 · Report as offensive     Reply Quote
Profileold_user17525

Send message
Joined: 13 Sep 04
Posts: 161
Credit: 284,548
RAC: 0
Message 6639 - Posted: 6 Dec 2004, 20:10:25 UTC - in response to Message 6633.  
Last modified: 6 Dec 2004, 20:14:31 UTC

&gt; Don't worry Marj, you can still force a manual update whenever they come back
&gt; online properley &amp; that resets the delay back to one minute. I've disabled
&gt; all my machine's network access again for now though.
&gt;
I've got Einstein running as well and thats happily crunching away - if I stop access for cpdn it seems to stop both and they're only short WUs. I've changed the preferences so cpdn is only doing 1/5 hrs as its only got 20 hrs left on it. It keeps trying to get more work so if this model finishes before its fixed it won't be doing anything anyway.

(I must say when you've only got a slow machine -it takes me 40ish days/model, it's amazing how paranoid you get about the final upload!)
Marj
_________________________________
ID: 6639 · Report as offensive     Reply Quote
Profileold_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 6647 - Posted: 6 Dec 2004, 20:43:24 UTC - in response to Message 6639.  

&gt; I've got Einstein running as well and thats happily crunching away - if I stop
&gt; access for cpdn it seems to stop both and they're only short WUs. I've changed
&gt; the preferences so cpdn is only doing 1/5 hrs as its only got 20 hrs left on
&gt; it. It keeps trying to get more work so if this model finishes before its
&gt; fixed it won't be doing anything anyway.

Luckily the machine that uploaded her final results earlier today already had a new model to crunch on - 'Jana', due to upload in 24 hours, already has a new model too - since I'm only crunching CP-boinc at the moment, I increased my queue to 2 days so a server outage <i>ought</i> to be fixed before they run out of work.

&gt; (I must say when you've only got a slow machine -it takes me 40ish days/model,
&gt; it's amazing how paranoid you get about the final upload!)

Final results upload seemed to go okay for 'Alison' but the model still shows as 'ready to report' in the BOINC GUI - I guess that won't clear until she can get through to a scheduler.

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>

<a href="http://www.nmvs.dsl.pipex.com/">Distributed Mania</a>
ID: 6647 · Report as offensive     Reply Quote
ProfileSpaceyCat

Send message
Joined: 30 Aug 04
Posts: 7
Credit: 1,554,414
RAC: 0
Message 6674 - Posted: 7 Dec 2004, 6:45:54 UTC

You've said that we can connect to the server at http://climateapps2.oucs.ox.ac.uk/cpdnboinc instead of the climatepredition.net. Can this be done while crunching or will it cause problems with the WUs? And how is this done?

Thanks!
ID: 6674 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 6676 - Posted: 7 Dec 2004, 7:15:34 UTC - in response to Message 6674.  

It works for attaching to climateapps2.oucs.ox.ac.uk/cpdnboinc instead of classic address.


&gt; You've said that we can connect to the server at
&gt; http://climateapps2.oucs.ox.ac.uk/cpdnboinc instead of the
&gt; climatepredition.net. Can this be done while crunching or will it cause
&gt; problems with the WUs? And how is this done?
&gt;
&gt; Thanks!
&gt;
ID: 6676 · Report as offensive     Reply Quote
Profileold_user949
Avatar

Send message
Joined: 20 Aug 04
Posts: 10
Credit: 132,163
RAC: 0
Message 6678 - Posted: 7 Dec 2004, 7:21:59 UTC
Last modified: 7 Dec 2004, 7:22:27 UTC

As I see from our team, people sent results yesterday and today. But me NOT! How this is possible? I just don't have enough luck? I did not change anything, BOINC is running on this slow 2 x P3 800MHz server for a weeks.
And I can't connect from my P4 workstation too ... so it is really strange.
ALL GLORY TO THE HYPNOTOAD!
Potrebujete pomoc?
My Stats
ID: 6678 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 6679 - Posted: 7 Dec 2004, 7:27:27 UTC - in response to Message 6678.  

This is what i ment earlier in another thread - only some machines/regions e.g. first floor vs. second floor in the same house :-) are affected.
After a restart, my main machine has the Master file fetch error, other two are happy...

&gt; As I see from our team, people sent results yesterday and today. But me NOT!
&gt; How this is possible? I just don't have enough luck? I did not change
&gt; anything, BOINC is running on this slow 2 x P3 800MHz server for a weeks.
&gt; And I can't connect from my P4 workstation too ... so it is really strange.
&gt;
ID: 6679 · Report as offensive     Reply Quote
old_user2147

Send message
Joined: 27 Aug 04
Posts: 55
Credit: 1,106,201
RAC: 0
Message 6680 - Posted: 7 Dec 2004, 7:44:27 UTC

Add me to the list of those having the same exact symptoms as Honza. My symptoms started not long after a reboot, also. Now I'm wary of doing any power cycling on my other machines! :-o
ID: 6680 · Report as offensive     Reply Quote
KeeperC

Send message
Joined: 5 Aug 04
Posts: 66
Credit: 2,146,056
RAC: 0
Message 6682 - Posted: 7 Dec 2004, 8:02:15 UTC - in response to Message 6680.  

&gt; Add me to the list of those having the same exact symptoms as Honza. My
&gt; symptoms started not long after a reboot, also. Now I'm wary of doing any
&gt; power cycling on my other machines! :-o
&gt;

I must be one of the lucky ones. My two machines continue to trickle without any problems.
ID: 6682 · Report as offensive     Reply Quote
old_user3

Send message
Joined: 5 Aug 04
Posts: 173
Credit: 1,843,046
RAC: 0
Message 6696 - Posted: 7 Dec 2004, 12:11:58 UTC

The scheduler RPC mechanism should now be working as it should.
You can force an update to update stats amongst other things
and remove those annoying warning messages.

ID: 6696 · Report as offensive     Reply Quote
Profileold_user949
Avatar

Send message
Joined: 20 Aug 04
Posts: 10
Credit: 132,163
RAC: 0
Message 6697 - Posted: 7 Dec 2004, 12:28:18 UTC
Last modified: 7 Dec 2004, 12:28:56 UTC

I still have the same error ...
climateprediction.net - 2004-12-07 14:00:22 - Master file parse failed
climateprediction.net - 2004-12-07 14:00:22 - Could not contact any schedulers for http://climateprediction.net/.
climateprediction.net - 2004-12-07 14:00:22 - Could not contact any schedulers for http://climateprediction.net/.
climateprediction.net - 2004-12-07 14:00:22 - Deferring communication with project for 1 weeks, 5 days, 13 hours, 53 minutes, and 9 seconds
(times are in GMT+1)
ALL GLORY TO THE HYPNOTOAD!
Potrebujete pomoc?
My Stats
ID: 6697 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Schedulers down too now..?

©2024 cpdn.org