Questions and Answers : Windows : Trickles not being reported for one model
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
This model run: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7384116 has apparently not reported any trickles since 23 Apr 08, yet the client thinks it\'s sending trickles just fine. As recently as ~10 minutes ago, it sent another: climateprediction.net 6/30/2008 5:03:42 PM Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks climateprediction.net 6/30/2008 5:03:47 PM Scheduler request succeeded: got 0 new tasks (the preceding is from boincview, so it doesn\'t look precisely like the format from the boinc client) In all other respects the client appears to be running fine. Other projects are humming along, model appears to be crunching, etc. Any idea what\'s going on? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Scheduler request succeeded That usually means that the server has the trickles, but on some occassions when the server is very busy, it doesn\'t actually accept the trickles. In which case, they usually finish uploading on a contact soon after. In the meantime, the trickle files are still showing on the user\'s computer, but with a different icon, and greyed out. Another possibility, which doesn\'t seem to apply, is that there has been a new computer ID issued, which is usually caused by using a backup. In this case, the trickles will be logged on the \'old\' (original), ID. But you don\'t have another appearance of that computer. The only other thing that I can think of, is that you created a new account at about that time, and the trickles since then have been going to the new account. As we haven\'t a clue about the ID of any such account, it would be up to you to find it. edit If there is a possiblity of a second account, then a way to look for it would be: On the computer in question, use Notepad to open client_state.xml Use Find to look for <project> Check the next couple of lines to see if they both mention this project name (climateprediction.net) Otherwise, do Find next If it\'s the right project, a few lines below will be: <hostid> compare the number with the one in this thread, just below your name, to the left of the posts. |
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
Another possibility, which doesn\'t seem to apply, is that there has been a new computer ID issued, which is usually caused by using a backup. In this case, the trickles will be logged on the \'old\' (original), ID. But you don\'t have another appearance of that computer. Well, I checked and the <hostid> shown in client_state.xml is still 221046. Another strong indication that this is not a problem with hostid or userid is that the \"Last Contact\" column on the site updates each time the computer sends a trickle. For example, it presently reads: 1 Jul 2008 16:00:29 UTC and the message log of the client states: climateprediction.net 7/1/2008 11:59:36 AM Scheduler request succeeded: got 0 new tasks Other than telling me that our clocks are about 1 minute off, that pretty much tells me that the client is communicating with CPDN on the correct hostid. I\'m fairly sure this is a problem with the CPDN database and not an issue with the client. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
I\'m fairly sure this is a problem with the CPDN database and not an issue with the client. Open up the graphics window and type \'Z\' to hide the sidebar and \'8\' to display the timestep. What phase and timestep number are shown? If it\'s anything less than phase 3 and timestep 75,614 the model has rewound and the server is ignoring your trickles because they\'ve already been received. If you can\'t run the graphics have a look at the file projects/climateprediction.net/hadsm3fub_0169_005941516.xml instead. The phase number and timestep at the last checkpoint are in <PH> and <TS> tags. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
I\'m fairly sure this is a problem with the CPDN database and not an issue with the client. Sure, here\'s a copy straight from it: <V>520</V> <MD>HADSM3</MD> <N>hadsm3fub_0169_005941516</N> <PH>3</PH> <TS>79311</TS> <DAY>3</DAY> <MTH>7</MTH> <YR>2055</YR> <HR>7</HR> <MIN>30</MIN> <SEC>0</SEC> |
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
Of course, I just noticed something that\'s pretty obviously \"not right\"... It\'s been sending trickles all right... Heh... It\'s been trying to send a trickle approximately once every hour of computational time since... oh, the last 2 and a half months or so. :O Thyme, I think you\'ve hit the likely scenario. It\'s in a loop that\'s going back to prior to the last (checkpoint? trickle point?) and then crossing it again and again and again. Looks like around 12-13 hundred times if I were to do some rough math. It\'s showing 1439 hours of computation time and just by some really rough math, I don\'t think it should take more than 8-900 hours to complete a slab model, even running hyperthreaded. Think I should abort this model? |
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
And for the final piece of evidence, you\'ll note that the only other computer running a task from the same workunit is stuck at exactly the same timestep and hasn\'t trickled for about one month, despite having contacted the server in the last couple hours. WU = 6153695 Other poor crucher schlepping the same data over and over: Worldwidewog I\'m going to hold off on aborting this model until I find out for sure that there\'s no useful info that the project gurus would find useful stored on the client. Can someone with more knowledge than me let me know for sure when is the right time to kill it? |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
That seems as conclusive as can be Thunder! Under the circumstances the best thing you can do is abort the model, but it would be really helpful if you could backup your projects/climateprediction.net and slots directories first in case the project team want a copy to investigate why the model is behaving this way. I\'ve sent a PM to Worldwidewog to pass on the bad news. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 1 Sep 04 Posts: 42 Credit: 6,475,117 RAC: 0 |
That seems as conclusive as can be Thunder! Thanks for the assistance and advice. I probably would have scratched my head for a while without it. :) |
©2025 cpdn.org