Questions and Answers : Windows : Hidden model crashes?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
I started a model and ran it in parallel with classic cpdn (v3.0.01) on my stock dell laptop. I set the preferences to run under Boinc only between 10pm and 7am. When both were running simultaneously, each got around 47% CPU. Every day, looking at the log file, I could see start and stop times reported as expected. The model appeared to trickle twice. One day, I noticed that although the boinc gui reported that the model had been restarted, it was taking no CPU. The hadsm3* processes were not present in task manager. I let it run overnight anyway and then boinc gui reported that it was suspended again in the morning for time of day. I observed the same behaviour the next night - the gui reported normal resume and suspend at 10pm and 7am, even though no cpu usage. Concluding that the model had crashed and the boinc gui hadn\'t noticed, I restarted the computer. This had the effect of aborting the first model and downloading a new one. 1. Surely boinc should report when the model crashes 2. I wouldn\'t expect a restart to cause a download of a new model. 3. I\'ve no idea what caused the crash. Classic model continues to run w/o any problems. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The hadsm3* program is the interface between BOINC and the CPDN model program hadsum3*. Both programs should continue to run (but not actually do anything) when BOINC suspends processing. If hadsm3* is terminated (manually or abnormally) you could get an XML file corruption which can cause the model to be aborted. The stdout.* and stderr.* files in your BOINC directory might give further clues. <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a> |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
I can't find anything out of the ordinary in either stderr or stdout. I rebooted on 11th Aug which caused the model to be aborted. Prior to that, stdout just reports (incorrectly) the suspend and restart activity. Can't actually tell at all when the model crashed. stderr.old: 2004-08-05 21:20:35 [SETI@home] Scheduler RPC to http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi failed 2004-08-05 21:20:35 [SETI@home] No schedulers responded 2004-08-05 21:20:35 [SETI@home] Deferring communication with project for 1 minutes and 0 seconds 2004-08-05 21:21:53 [SETI@home] No work from project 2004-08-05 21:21:53 [SETI@home] Deferring communication with project for 1 days, 0 hours, 0 minutes, and 0 seconds stdout.txt: 2004-08-05 21:20:18 [---] Starting BOINC client version 4.03 for windows_intelx86 2004-08-05 21:20:18 [SETI@home] Project prefs: no separate prefs for home; using your defaults 2004-08-05 21:20:18 [---] State file has different major version (3.19); resetting projects 2004-08-05 21:20:18 [SETI@home] Resetting project 2004-08-05 21:20:18 [SETI@home] Host ID is 48433 2004-08-05 21:20:18 [---] General prefs: from SETI@home (last modified 2004-07-04 18:18:01) 2004-08-05 21:20:18 [---] General prefs: no separate prefs for home; using your defaults 2004-08-05 21:20:31 [---] CPU scheduler starvation imminent; requesting more work 2004-08-05 21:20:31 [SETI@home] Requesting 10840 seconds of work 2004-08-05 21:20:32 [SETI@home] Sending request to scheduler: http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi 2004-08-05 21:21:30 [http://climateprediction.net/] Project prefs: no separate prefs for home; using your defaults 2004-08-05 21:21:31 [---] CPU scheduler starvation imminent; requesting more work 2004-08-05 21:21:34 [---] CPU scheduler starvation imminent; requesting more work 2004-08-05 21:21:34 [http://climateprediction.net/] Requesting 5420 seconds of work 2004-08-05 21:21:34 [http://climateprediction.net/] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2004-08-05 21:21:37 [http://climateprediction.net/] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2004-08-05 21:21:37 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults 2004-08-05 21:21:37 [climateprediction.net] Started download of hadsm3_4.02_windows_intelx86.exe 2004-08-05 21:21:37 [climateprediction.net] Started download of hadsm3data_4.02_windows_intelx86.zip 2004-08-05 21:21:39 [---] CPU scheduler starvation imminent; requesting more work 2004-08-05 21:21:39 [SETI@home] Requesting 5420 seconds of work 2004-08-05 21:21:39 [SETI@home] Sending request to scheduler: http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi 2004-08-05 21:21:53 [SETI@home] Scheduler RPC to http://setiboincdata.ssl.berkeley.edu/sah_cgi/cgi succeeded 2004-08-05 21:21:53 [SETI@home] Message from server: To participate in this project, you must use major version 3 of the BOINC core client. Your core client is major version 4. 2004-08-05 21:21:53 [---] General prefs: from SETI@home (last modified 2004-07-04 18:18:01) 2004-08-05 21:21:53 [---] General prefs: using your defaults 2004-08-05 21:22:11 [climateprediction.net] Finished download of hadsm3_4.02_windows_intelx86.exe 2004-08-05 21:22:11 [climateprediction.net] Approximate throughput 30805.184019 bytes/sec 2004-08-05 21:22:15 [climateprediction.net] Started download of hadsm3um_4.02_windows_intelx86.zip 2004-08-05 21:23:43 [climateprediction.net] Finished download of hadsm3um_4.02_windows_intelx86.zip 2004-08-05 21:23:43 [climateprediction.net] Approximate throughput 21780.601688 bytes/sec 2004-08-05 21:23:43 [climateprediction.net] Started download of hadsm3se_4.02_windows_intelx86.zip 2004-08-05 21:24:12 [climateprediction.net] Finished download of hadsm3data_4.02_windows_intelx86.zip 2004-08-05 21:24:12 [climateprediction.net] Approximate throughput 29680.966218 bytes/sec 2004-08-05 21:24:12 [climateprediction.net] Started download of 0091_000025310.zip 2004-08-05 21:24:14 [climateprediction.net] Finished download of 0091_000025310.zip 2004-08-05 21:24:14 [climateprediction.net] Approximate throughput 5004.223612 bytes/sec 2004-08-05 21:24:17 [climateprediction.net] Finished download of hadsm3se_4.02_windows_intelx86.zip 2004-08-05 21:24:17 [climateprediction.net] Approximate throughput 25218.657865 bytes/sec 2004-08-05 21:24:17 [climateprediction.net] Starting computation for result 0091_000025310_0 using hadsm3 version 4.02 2004-08-05 21:40:06 [SETI@home] Resetting project 2004-08-05 21:40:06 [SETI@home] Detaching from project 2004-08-05 21:44:49 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2004-08-05 21:44:52 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2004-08-05 21:44:52 [climateprediction.net] General preferences have been updated 2004-08-05 21:44:52 [---] General prefs: from climateprediction.net (last modified 2004-08-05 21:42:21) 2004-08-05 21:44:52 [---] General prefs: no separate prefs for home; using your defaults 2004-08-05 21:44:52 [---] Suspending computation and network activity - time of day 2004-08-05 22:00:00 [---] Resuming computation and network activity 2004-08-06 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-06 22:00:00 [---] Resuming computation and network activity 2004-08-07 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-07 22:00:00 [---] Resuming computation and network activity 2004-08-07 22:09:16 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2004-08-07 22:09:19 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2004-08-07 22:09:19 [---] General prefs: from climateprediction.net (last modified 2004-08-05 21:42:21) 2004-08-07 22:09:19 [---] General prefs: using your defaults 2004-08-08 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-08 22:00:00 [---] Resuming computation and network activity 2004-08-09 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-09 22:00:00 [---] Resuming computation and network activity 2004-08-10 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-10 22:00:00 [---] Resuming computation and network activity 2004-08-11 07:00:00 [---] Suspending computation and network activity - time of day 2004-08-11 22:00:00 [---] Resuming computation and network activity |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
BTW: This is the stdout file for the new model on restart. The reason for the new model download seems to be starvation - its lost track of the previous model... 2004-08-11 23:22:12 [---] Starting BOINC client version 4.03 for windows_intelx86 2004-08-11 23:22:12 [---] No general preferences found - using BOINC defaults 2004-08-11 23:22:12 [---] Running CPU benchmarks 2004-08-11 23:22:18 [---] Suspending computation and network activity - running CPU benchmarks 2004-08-11 23:23:13 [---] Benchmark results: 2004-08-11 23:23:13 [---] Number of CPUs: 1 2004-08-11 23:23:13 [---] 1827 double precision MIPS (Whetstone) per CPU 2004-08-11 23:23:13 [---] 3773 integer MIPS (Dhrystone) per CPU 2004-08-11 23:23:13 [---] Finished CPU benchmarks 2004-08-11 23:23:14 [---] Resuming computation and network activity 2004-08-11 23:25:17 [http://climateprediction.net/] Project prefs: using your defaults 2004-08-11 23:25:18 [---] CPU scheduler starvation imminent; requesting more work 2004-08-11 23:25:22 [---] CPU scheduler starvation imminent; requesting more work 2004-08-11 23:25:22 [http://climateprediction.net/] Requesting 17280 seconds of work 2004-08-11 23:25:22 [http://climateprediction.net/] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2004-08-11 23:25:25 [http://climateprediction.net/] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 2004-08-11 23:25:25 [http://climateprediction.net/] General preferences have been updated 2004-08-11 23:25:25 [---] General prefs: from climateprediction.net (last modified 2004-08-05 21:42:21) 2004-08-11 23:25:25 [---] General prefs: no separate prefs for home; using your defaults 2004-08-11 23:25:25 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults 2004-08-11 23:25:25 [climateprediction.net] Started download of hadsm3_4.02_windows_intelx86.exe 2004-08-11 23:25:25 [climateprediction.net] Started download of hadsm3data_4.02_windows_intelx86.zip 2004-08-11 23:25:57 [climateprediction.net] Finished download of hadsm3_4.02_windows_intelx86.exe 2004-08-11 23:25:57 [climateprediction.net] Approximate throughput 32978.527555 bytes/sec 2004-08-11 23:25:57 [climateprediction.net] Started download of hadsm3um_4.02_windows_intelx86.zip 2004-08-11 23:27:06 [climateprediction.net] Finished download of hadsm3um_4.02_windows_intelx86.zip 2004-08-11 23:27:06 [climateprediction.net] Approximate throughput 29382.080247 bytes/sec 2004-08-11 23:27:06 [climateprediction.net] Started download of hadsm3se_4.02_windows_intelx86.zip 2004-08-11 23:27:38 [climateprediction.net] Finished download of hadsm3se_4.02_windows_intelx86.zip 2004-08-11 23:27:38 [climateprediction.net] Approximate throughput 26471.214629 bytes/sec 2004-08-11 23:27:39 [climateprediction.net] Started download of 01xl_000027490.zip 2004-08-11 23:27:42 [climateprediction.net] Finished download of 01xl_000027490.zip 2004-08-11 23:27:42 [climateprediction.net] Approximate throughput 3204.227296 bytes/sec 2004-08-11 23:27:47 [climateprediction.net] Finished download of hadsm3data_4.02_windows_intelx86.zip 2004-08-11 23:27:47 [climateprediction.net] Approximate throughput 31635.423784 bytes/sec 2004-08-11 23:27:48 [climateprediction.net] Starting computation for result 01xl_000027490_0 using hadsm3 version 4.02 2004-08-12 07:00:00 [---] Suspending computation and network activity - time of day |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
> I can't find anything out of the ordinary in either stderr or stdout. I > rebooted on 11th Aug which caused the model to be aborted. Prior to that, > stdout just reports (incorrectly) the suspend and restart activity. Can't > actually tell at all when the model crashed. > > stdout.txt: > 2004-08-05 21:20:18 [---] Starting BOINC client version 4.03 for > windows_intelx86 > 2004-08-05 21:20:18 [SETI@home] Project prefs: no separate prefs for home; > using your defaults > 2004-08-05 21:20:18 [---] State file has different major version (3.19); > resetting projects > 2004-08-05 21:20:18 [SETI@home] Resetting project > 2004-08-05 21:20:18 [SETI@home] Host ID is 48433 > 2004-08-05 21:21:30 [http://climateprediction.net/] Project prefs: no separate > prefs for home; using your defaults > 2004-08-05 21:21:53 [SETI@home] Message from server: To participate in this > project, you must use major version 3 of the BOINC core client. Your core > client is major version 4. I think this is part of your problem. BOINC is attached to the SETI and CPDN projects but they're not compatible. SETI won't run with BOINC 4.02 (which you're running) and CPDN requires it. <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a> |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
I did start running while still registerd to SETI. But I disconnected from that project within 20 minutes of starting Boinc 4.02 up and 6 days before this error occurred. Do you really think that was the problem? Why wait 5 days to crash? It certainly trickled at least once after I disconnected from SETI. Also I just noticed that I now have my machine registered twice. It is obviously registered first when I first joined the project on august 5th (id 158). But after the reboot, when the new model downloaded, it was aparently registered again for a second time (id 936). I only just found this out browsing my account. I certainly don't want my machine re-registered every model download. Think about my stats! |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
> 2004-08-05 21:24:17 [climateprediction.net] Starting computation for result > 0091_000025310_0 using hadsm3 version 4.02 > 2004-08-05 21:40:06 [SETI@home] Resetting project > 2004-08-05 21:40:06 [SETI@home] Detaching from project Missed your detach from SETI here. Sorry. > 2004-08-05 21:44:52 [---] Suspending computation and network activity - time > of day > 2004-08-05 22:00:00 [---] Resuming computation and network activity > 2004-08-06 07:00:00 [---] Suspending computation and network activity - time > of day > 2004-08-06 22:00:00 [---] Resuming computation and network activity > 2004-08-07 07:00:00 [---] Suspending computation and network activity - time > of day > 2004-08-07 22:00:00 [---] Resuming computation and network activity > 2004-08-07 22:09:16 [climateprediction.net] Sending request to scheduler: > http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi > 2004-08-07 22:09:19 [climateprediction.net] Scheduler RPC to > http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded That's got to be a trickle, after just over 2 days of running during your timed period. > 2004-08-07 22:09:19 [---] General prefs: from climateprediction.net (last > modified 2004-08-05 21:42:21) > 2004-08-07 22:09:19 [---] General prefs: using your defaults > 2004-08-08 07:00:00 [---] Suspending computation and network activity - time > of day > 2004-08-08 22:00:00 [---] Resuming computation and network activity > 2004-08-09 07:00:00 [---] Suspending computation and network activity - time > of day > 2004-08-09 22:00:00 [---] Resuming computation and network activity If things were running normally you should probably have had a second trickle somewhere around here. Just had a thought. The file 0091_000025310.xml should still be in your climateprediction.net directory, and its timestamp will reveal when things went wrong. Then look in the event log to see if there's anything to indicate what happened around that time. The fact that your machine has re-registered is pointing towards a corruption of the client_state.xml file in the BOINC directory. I wouldn't worry too much about your stats when the machine re-registered (although you've obviously not been credited for the timesteps done after the first trickle). The new model is still crunching for the same account (it'll just have one more computer). <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a> |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
> If things were running normally you should probably have had a second trickle > somewhere around here. Just had a thought. The file 0091_000025310.xml > should still be in your climateprediction.net directory, and its timestamp > will reveal when things went wrong. Then look in the event log to see if > there's anything to indicate what happened around that time. Last modified 08.08.04 06:47 System event log extract: Information 08/08/2004 14:29:30 Service Control Manager None 7036 N/A D800 Information 08/08/2004 14:29:30 Service Control Manager None 7035 SYSTEM D800 Warning 08/08/2004 10:32:54 Dhcp None 1003 N/A D800 Information 08/08/2004 10:26:17 RemoteAccess None 20159 N/A D800 Information 08/08/2004 10:26:08 RemoteAccess None 20158 N/A D800 Information 08/08/2004 10:25:48 Service Control Manager None 7035 SYSTEM D800 Information 08/08/2004 10:25:43 Service Control Manager None 7035 SYSTEM D800 Information 08/08/2004 10:25:38 Service Control Manager None 7035 SYSTEM D800 Information 08/08/2004 01:04:13 srservice None 108 N/A D800 Not sure what, if any, sign of a problem there is there. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Sorry, I should have specified the application log :( <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a> |
©2025 cpdn.org