Questions and Answers : Windows : PC Freeze stops undamaged WU??
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
I\'ve had a look through the fora, but can\'t find anything quite like this one. The BOINC Manager downloaded an other WU after a freeze and hard reboot but I can\'t see any differences in the two sets of xml files other than the names. No error had been reported to the server, so I was expecting an easy restart. Does anyone have a suggestion as to which file I should be looking at for some error or indication of curruption? It wouldn\'t be something as simple as a changed Host ID would it? Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
If the boinc manager loses contact with the model for a period of time, it assumes that the model has crashed and downloads a new one. As long as both models look OK, what I would suggest is : * Set \'no more work\' against the project (this prevents boinc downloading other models unnecessarily) * Suspend or abort the newest model (since it\'s done no work) I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
If the boinc manager loses contact with the model for a period of time, it assumes that the model has crashed and downloads a new one. I have the new one suspended, but the old one doesn\'t show up in BOINC Manager and I can\'t work out why. I\'m stumped, unless someone knows where the Host ID goes, that\'s all I can think of. Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
There\'s no evidence of a model failure on the website, but perhaps there may be something on your \'messages\' tab, or in the log files (StdErrGui.txt, StdErrDae.txt). There are indeed two computer IDs with active results, I take it you only actually have one in progress? I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
There\'s no evidence of a model failure on the website, but perhaps there may be something on your \'messages\' tab, or in the log files (StdErrGui.txt, StdErrDae.txt). Yes, I only have the one on that PC. I merged them but I\'m not sure if there is a way to merge the new into the old or, indeed, if that would make any difference. The other PC has very similar specs, but is not shown in the above hyperlink. This is the last time it contacted the server: 2006-03-04 14:19:28 [---] Resuming round-robin CPU scheduling. 2006-03-04 14:19:28 [climateprediction.net] Resuming result sulphur_iy22_000883946_0 using sulphur_cycle version 422 This is the last entry before the restart & what followed: 2006-03-04 15:10:55 [SETI@home] Started download of better_banner.jpg To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2006-03-04 21:07:36 [---] Starting BOINC client version 5.2.13 for windows_intelx86 2006-03-04 21:07:36 [---] libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3 2006-03-04 21:07:36 [---] Data directory: C:\\Program Files\\BOINC 2006-03-04 21:07:36 [---] Missing open tag in state file. 2006-03-04 21:07:36 [---] Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz 2006-03-04 21:07:36 [---] Memory: 1023.48 MB physical, 2.40 GB virtual 2006-03-04 21:07:36 [---] Disk: 31.25 GB total, 12.08 GB free 2006-03-04 21:07:36 [---] Version change detected (0.0.0 -> 5.2.13); running CPU benchmarks 2006-03-04 21:07:36 [Einstein@Home] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [Leiden Classical] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [LHC@home] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [SETI@home] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [climateprediction.net] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [uFluids] Computer ID: not assigned yet; location: home; project prefs: default 2006-03-04 21:07:36 [World Community Grid] Computer ID: not assigned yet; location: ; project prefs: default Just the usual stuff after this. What seems strange is that all the other projects reloaded after I merged them. I don\'t know if any of that is going to be of help to you. I can\'t see anything out of the ordinary except that all the hosts were dropped. I haven\'t had that happen before. Now that it\'s summer here, I\'ve had this PC seeze up a couple of times. It\'s not OC\'ed, it looks pretty clean but I may have to reset the heatsink. But that\'s another story. Thanks for the help, I\'m almost at the point of dropping the work I\'ve done so far and running the new WU but I\'m just having trouble letting go of all those hours :) Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... It looks like the boinc files were corrupted somehow judging from that... I\'m not sure if it\'s possible to restore the model. There is a boinc wiki about restoring multi-project setups from backup, while that\'s not directly relevant it may offer enough details on the internal workings of the state file to look through and see if there are any ideas there. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Yes, it looks like the client_stae.xml file has been corrupted. Do you have a backup? If so restore. Otherwise, you\'ll have to start again. That file has numerous sections, all with \'signed\' (i.e. checksumed), sections to prevent tampering. |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
Yes, it looks like the client_stae.xml file has been corrupted. I have found the old work unit! Thanks to both MikeMars and Les Bayliss. I\'ve now gone back to run down the existing non-CPDN work units. I won\'t cancel the new CPDN wu until I\'ve successfully recovered from the mess. I have implemented a scripted backup regime :) Oh, I did have a backup, that\'s where I got the replacement client_state.xml file. I\'ve \"automated\" somewhat. I’ll post back when I’ve completed the next step, to let you know how successful I\'ve been, or not. Thanks again, I appreciate your patience. Mike Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Sounds like there may be a good outcome :-) Didn\'t look very positive at first. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
Sounds like there may be a good outcome :-) Didn\'t look very positive at first. It has created some odd results, here\'s just one example: 7/03/2006 2:20:48 AM|Einstein@Home|ACTIVE_TASKS::restart_tasks(); missing files 7/03/2006 2:20:48 AM|Einstein@Home|Unrecoverable error for result r1_1009.0__165_S4R2a_1 (One or more missing files) I think Einstein had already crunched it with the other host session. I had heaps of strange error messages, all of them seemed to point to missing bits. So I just let them go. Now all I have to do, is wait a couple of days while this host \"version\" runs down its other project WU\'s. Then switch back. When I reinstall the client_state.xml file I\'ll know if it\'s all worked out. I\'ll wait in hope ;-) Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
All is well now. I only lost a handfull of hours crunching from the CPDN WU, which I assume restarted from when it was backed up. That\'s OK, it still saved me about 200 hours :-) All the other projects are back on track and looking good. Again thanks to MikeMArs and Les Bayliss Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
If the boinc manager loses contact with the model for a period of time, it assumes that the model has crashed and downloads a new one. I\'ve recovred the old one intact and I\'m running that now. Ultimatly I aborted the newer one and merged all the phantom computers. That leaves me with two P4 3.2 GHz machines running one CPDN WU each. I\'ve even had a few new trickle ups and my credits are on the rise. Thanks for all the help Mike, we made it in the end :-) Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 13 Sep 04 Posts: 228 Credit: 354,979 RAC: 0 |
It\'s always good to hear that someone managed to recover from a backup! |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,555,470 RAC: 1,428 |
It\'s always good to hear that someone managed to recover from a backup! I was so surprised I posted the recovery details on our team web site! Click here to join the #1 Aussie Alliance on Climate Prediction |
©2025 cpdn.org