climateprediction.net (CPDN) home page
Thread 'Trickles update hanging/delayed again?'

Thread 'Trickles update hanging/delayed again?'

Message boards : Number crunching : Trickles update hanging/delayed again?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user3144
Avatar

Send message
Joined: 30 Aug 04
Posts: 50
Credit: 237,894
RAC: 0
Message 12423 - Posted: 9 May 2005, 10:23:08 UTC

Hello there,

i observe an increasing delay between reporting of trickles and the display in the user account. For example i have a model at 35.86% in step 19768 in phase 2, that has the last trickle from step 216040 from phase 1 reported on 2nd may. I'm starting to get worried about, because BOINC shows no errors on reporting the trickles.

Anyone else observed that?

greetz, Uli

ID: 12423 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 12424 - Posted: 9 May 2005, 10:33:31 UTC

Yes, see for example Server data incosistent on classic board there
http://www.climateprediction.net/board/viewtopic.php?t=2934
<i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a>
ID: 12424 · Report as offensive     Reply Quote
old_user26115
Avatar

Send message
Joined: 21 Oct 04
Posts: 24
Credit: 207,633
RAC: 0
Message 12425 - Posted: 9 May 2005, 12:31:13 UTC
Last modified: 9 May 2005, 12:49:43 UTC

yes I noticed this too.
My last trickle was on 05.08.05 at 00:15 UTC, since then I reached at least two more 'milestone' to be trickled, but nothing happened.
-only to informe!-
reported but not trickled:
08.05.2005 20:33:46|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
08.05.2005 20:33:47|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded

and
09.05.2005 13:55:36|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
09.05.2005 13:55:37|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded


greetz from Switzerland
littleBouncer

ID: 12425 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 12440 - Posted: 9 May 2005, 17:36:26 UTC - in response to Message 12423.  
Last modified: 9 May 2005, 17:48:29 UTC

it looks like the server ran out of space, I cleaned up old images so hopefully that will give it room for awhile. I also restarted the trickle handling processes so that should reflush everything. Trickles/credits are based on the latest trickle received so even if you have a "gap" of missing trickles, the next one will give the correct credit.
ID: 12440 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 12441 - Posted: 9 May 2005, 18:39:36 UTC

Looks like it's starting to catch up now. Thanks Carl.
ID: 12441 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 12442 - Posted: 9 May 2005, 18:56:09 UTC - in response to Message 12441.  

&gt; Looks like it's starting to catch up now. Thanks Carl.

it could very well break down again; the servers are maxxed out as to space and the database is running very slow, so a lot of inserts &amp; updates are timing out due to the row locking taking so long etc. it's something that they'll have to get someone at Oxford to get on (Tolu is away on holiday and already overworked anyway)
ID: 12442 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 12446 - Posted: 9 May 2005, 20:08:19 UTC - in response to Message 12442.  

&gt; (Tolu is away on holiday and
- <b>already overworked anyway)</b>
&gt;
That qualifies for understatement of the year.
ID: 12446 · Report as offensive     Reply Quote
old_user3144
Avatar

Send message
Joined: 30 Aug 04
Posts: 50
Credit: 237,894
RAC: 0
Message 12448 - Posted: 9 May 2005, 23:13:04 UTC

It works now, kinda: The missing trickles appeared :D

But when BOINC now contacts the server i get this in the BOINC log:
climateprediction.net - 2005-05-10 00:14:03 - Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
climateprediction.net - 2005-05-10 00:14:05 - Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed
climateprediction.net - 2005-05-10 00:14:05 - No schedulers responded
climateprediction.net - 2005-05-10 00:14:05 - Deferring communication with project for 13 minutes and 39 seconds

...and this in my proxomitron log:

+++GET 146+++
POST /cpdnboinc_cgi/cgi HTTP/1.0
Pragma: no-cache
Cache-Control: no-cache
Host: climateapps2.oucs.ox.ac.uk:80
Content-Type: application/octet-stream
Content-Length: 4197
Connection: keep-alive
Browser reload detected...
Posting 4197 bytes...

+++RESP 146+++
HTTP/1.0 500 Internal Server Error
Date: Mon, 09 May 2005 22:10:33 GMT
Server: Apache/2.0.50 (Unix) PHP/4.3.8
Content-Length: 537
Connection: close
Content-Type: text/html; charset=iso-8859-1
+++CLOSE 146+++

I think the server now has a different kind of problem.
Don't want to sound too critic ;)
I'm a developer myself and know, the job is hard :)

greetz, Uli

ID: 12448 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 12452 - Posted: 10 May 2005, 3:54:49 UTC
Last modified: 10 May 2005, 6:55:03 UTC

the database is at 100% disk space utilization so it's just thrashing away and no room for new requests, trickles, etc. I've told the relevent (remaining!) people at Oxford, so hopefully they can get someone on it as it's a bit more work than I can do from here (and for free! :-)
ID: 12452 · Report as offensive     Reply Quote
old_user18746

Send message
Joined: 17 Sep 04
Posts: 25
Credit: 196,284
RAC: 0
Message 12453 - Posted: 10 May 2005, 7:05:53 UTC


I still have one missing which I uploaded on 8th. I'm suspending CPDN until this issue is resolved as I have seen all sorts of weird things happen in other projects when there is a database problem.

I hope we won't have to wait until Tolu is back.

ID: 12453 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 12480 - Posted: 10 May 2005, 17:59:45 UTC - in response to Message 12453.  

seems to be OK now. trickles are minor and not a "crippling" error, the latest trickle received lets us know how far you are, how much credit, etc.
ID: 12480 · Report as offensive     Reply Quote
[B^S] mavau

Send message
Joined: 30 Aug 04
Posts: 142
Credit: 9,936,132
RAC: 0
Message 12483 - Posted: 10 May 2005, 18:14:51 UTC

I'm also getting the Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed.
What I do in that case is disable BOINC network access after an error message.
BTW <b>Carl, thanks for your hard work.</b>

Forum search Site search
ID: 12483 · Report as offensive     Reply Quote
Mark Rush

Send message
Joined: 31 Aug 04
Posts: 4
Credit: 4,674,319
RAC: 603
Message 12485 - Posted: 10 May 2005, 21:40:13 UTC - in response to Message 12483.  

I, too, am getting the "http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed" message.

Given that I am crunching a bunch of other projects along with climate predictor, is there any reason (beside my ignorance how to do it :) ) to disable BOINC network access?

Also, I want to second the "THANKS FOR ALL THE HARD WORK" being offered to Carl and all the others who are working so diligently on this endeavor!

Mark


&gt; I'm also getting the Scheduler RPC to
&gt; http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed.
&gt; What I do in that case is disable BOINC network access after an error
&gt; message.
&gt; BTW <b>Carl, thanks for your hard work.</b>
&gt;
<img src="http://www.boincstats.com/stats/banner.php?cpid=6464d4003a8171632b64fd8221be65e4">
ID: 12485 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 12499 - Posted: 11 May 2005, 6:42:25 UTC

Scheduler and trickles are updated :-)
<i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a>
ID: 12499 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 12500 - Posted: 11 May 2005, 7:10:04 UTC

Mark
Should you need to disable network access, just go to File, on the menu of the BOINC gui, and click "Disable BOINC Network Access".

If you're not using the gui, you would need a cli instruction. I'm not sure if there is one, let alone what it is.

You are right though, about a problem when running multiple projects.
I think that all you can do is set CPDN to a very low time share.

Les


ID: 12500 · Report as offensive     Reply Quote
old_user355

Send message
Joined: 7 Aug 04
Posts: 187
Credit: 44,163
RAC: 0
Message 12540 - Posted: 13 May 2005, 4:21:20 UTC

Is this project going to $#%* Carl?

(Not a complaint, just a question)
ID: 12540 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 12541 - Posted: 13 May 2005, 6:21:53 UTC

The worst problems seem to be fixed for now, I hope they can fix the problems that came from the database trouble too after some time.

I think, with some smart shell scripts and SQLs it should be possible to fix the states and outcomes of models with missing trickles and lost reports.


The data must all be available, it's just a question of recovering the stats so the finished models will be used for science and not just sit there waiting for the BOINC report that will never come.
ID: 12541 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 12542 - Posted: 13 May 2005, 7:07:00 UTC - in response to Message 12540.  

&gt; Is this project going to $#%* Carl?
&gt;
&gt; (Not a complaint, just a question)
&gt;

oh I've been saying it has all the time I was there! ;-)

the result outcomes aren't really used for the science in fact, it's the uploaded files on the upload servers that really matter, i.e. we know when we have a complete set that a run is there even if the database for that run may say "Working", and since trickles are what make the credits, as long as the final megatrickles get in you'll get the proper credits and summary graphs.
ID: 12542 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 12545 - Posted: 13 May 2005, 13:00:05 UTC
Last modified: 13 May 2005, 13:00:45 UTC

Now that's good news - yes, the complete graph is there with all expected phases in the models that caused me headache, so the server must know that it's complete :-)

Missing trickles don't bug me, I can make trickles myself ;-)
ID: 12545 · Report as offensive     Reply Quote
[B^S] mavau

Send message
Joined: 30 Aug 04
Posts: 142
Credit: 9,936,132
RAC: 0
Message 12547 - Posted: 13 May 2005, 18:23:37 UTC

Seconded <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2544#12546">here</a>, Ananas.
<b>Thanks again, Carl</b>.

Forum search Site search
ID: 12547 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Trickles update hanging/delayed again?

©2024 cpdn.org