Message boards : Number crunching : Trickles update hanging/delayed again?
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 04 Posts: 50 Credit: 237,894 RAC: 0 |
Hello there, i observe an increasing delay between reporting of trickles and the display in the user account. For example i have a model at 35.86% in step 19768 in phase 2, that has the last trickle from step 216040 from phase 1 reported on 2nd may. I'm starting to get worried about, because BOINC shows no errors on reporting the trickles. Anyone else observed that? greetz, Uli |
Send message Joined: 5 Aug 04 Posts: 390 Credit: 2,475,242 RAC: 0 |
Yes, see for example Server data incosistent on classic board there http://www.climateprediction.net/board/viewtopic.php?t=2934 <i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a> |
Send message Joined: 21 Oct 04 Posts: 24 Credit: 207,633 RAC: 0 |
yes I noticed this too. My last trickle was on 05.08.05 at 00:15 UTC, since then I reached at least two more 'milestone' to be trickled, but nothing happened. -only to informe!- reported but not trickled: 08.05.2005 20:33:46|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 08.05.2005 20:33:47|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded and 09.05.2005 13:55:36|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 09.05.2005 13:55:37|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded greetz from Switzerland littleBouncer |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
it looks like the server ran out of space, I cleaned up old images so hopefully that will give it room for awhile. I also restarted the trickle handling processes so that should reflush everything. Trickles/credits are based on the latest trickle received so even if you have a "gap" of missing trickles, the next one will give the correct credit. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
Looks like it's starting to catch up now. Thanks Carl. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
> Looks like it's starting to catch up now. Thanks Carl. it could very well break down again; the servers are maxxed out as to space and the database is running very slow, so a lot of inserts & updates are timing out due to the row locking taking so long etc. it's something that they'll have to get someone at Oxford to get on (Tolu is away on holiday and already overworked anyway) |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
> (Tolu is away on holiday and - <b>already overworked anyway)</b> > That qualifies for understatement of the year. |
Send message Joined: 30 Aug 04 Posts: 50 Credit: 237,894 RAC: 0 |
It works now, kinda: The missing trickles appeared :D But when BOINC now contacts the server i get this in the BOINC log: climateprediction.net - 2005-05-10 00:14:03 - Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi climateprediction.net - 2005-05-10 00:14:05 - Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed climateprediction.net - 2005-05-10 00:14:05 - No schedulers responded climateprediction.net - 2005-05-10 00:14:05 - Deferring communication with project for 13 minutes and 39 seconds ...and this in my proxomitron log: +++GET 146+++ POST /cpdnboinc_cgi/cgi HTTP/1.0 Pragma: no-cache Cache-Control: no-cache Host: climateapps2.oucs.ox.ac.uk:80 Content-Type: application/octet-stream Content-Length: 4197 Connection: keep-alive Browser reload detected... Posting 4197 bytes... +++RESP 146+++ HTTP/1.0 500 Internal Server Error Date: Mon, 09 May 2005 22:10:33 GMT Server: Apache/2.0.50 (Unix) PHP/4.3.8 Content-Length: 537 Connection: close Content-Type: text/html; charset=iso-8859-1 +++CLOSE 146+++ I think the server now has a different kind of problem. Don't want to sound too critic ;) I'm a developer myself and know, the job is hard :) greetz, Uli |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
the database is at 100% disk space utilization so it's just thrashing away and no room for new requests, trickles, etc. I've told the relevent (remaining!) people at Oxford, so hopefully they can get someone on it as it's a bit more work than I can do from here (and for free! :-) |
Send message Joined: 17 Sep 04 Posts: 25 Credit: 196,284 RAC: 0 |
I still have one missing which I uploaded on 8th. I'm suspending CPDN until this issue is resolved as I have seen all sorts of weird things happen in other projects when there is a database problem. I hope we won't have to wait until Tolu is back. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
seems to be OK now. trickles are minor and not a "crippling" error, the latest trickle received lets us know how far you are, how much credit, etc. |
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
I'm also getting the Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed. What I do in that case is disable BOINC network access after an error message. BTW <b>Carl, thanks for your hard work.</b> Forum search Site search |
Send message Joined: 31 Aug 04 Posts: 4 Credit: 4,671,834 RAC: 423 |
I, too, am getting the "http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed" message. Given that I am crunching a bunch of other projects along with climate predictor, is there any reason (beside my ignorance how to do it :) ) to disable BOINC network access? Also, I want to second the "THANKS FOR ALL THE HARD WORK" being offered to Carl and all the others who are working so diligently on this endeavor! Mark > I'm also getting the Scheduler RPC to > http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed. > What I do in that case is disable BOINC network access after an error > message. > BTW <b>Carl, thanks for your hard work.</b> > <img src="http://www.boincstats.com/stats/banner.php?cpid=6464d4003a8171632b64fd8221be65e4"> |
Send message Joined: 5 Aug 04 Posts: 390 Credit: 2,475,242 RAC: 0 |
Scheduler and trickles are updated :-) <i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mark Should you need to disable network access, just go to File, on the menu of the BOINC gui, and click "Disable BOINC Network Access". If you're not using the gui, you would need a cli instruction. I'm not sure if there is one, let alone what it is. You are right though, about a problem when running multiple projects. I think that all you can do is set CPDN to a very low time share. Les |
Send message Joined: 7 Aug 04 Posts: 187 Credit: 44,163 RAC: 0 |
Is this project going to $#%* Carl? (Not a complaint, just a question) |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
The worst problems seem to be fixed for now, I hope they can fix the problems that came from the database trouble too after some time. I think, with some smart shell scripts and SQLs it should be possible to fix the states and outcomes of models with missing trickles and lost reports. The data must all be available, it's just a question of recovering the stats so the finished models will be used for science and not just sit there waiting for the BOINC report that will never come. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
> Is this project going to $#%* Carl? > > (Not a complaint, just a question) > oh I've been saying it has all the time I was there! ;-) the result outcomes aren't really used for the science in fact, it's the uploaded files on the upload servers that really matter, i.e. we know when we have a complete set that a run is there even if the database for that run may say "Working", and since trickles are what make the credits, as long as the final megatrickles get in you'll get the proper credits and summary graphs. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Now that's good news - yes, the complete graph is there with all expected phases in the models that caused me headache, so the server must know that it's complete :-) Missing trickles don't bug me, I can make trickles myself ;-) |
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
Seconded <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2544#12546">here</a>, Ananas. <b>Thanks again, Carl</b>. Forum search Site search |
©2024 cpdn.org