Thread '615 hours of work disapeared'

Author	Message
old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20809 - Posted: 27 Feb 2006, 18:18:50 UTC Today after 615 hours of work and some 500 left to go climate uploaded result, the remaining 500 hours went puff on my system and I can find no result for the upload on the results page. What happened? ID: 20809 · Reply Quote

MikeMarsUK Volunteer moderator Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0	Message 20811 - Posted: 27 Feb 2006, 19:18:58 UTC Last modified: 27 Feb 2006, 19:31:28 UTC Was there anything useful in the \'messages\' tab? (or if you\'ve restarted since, in the log files? (something like stderrgui.txt, stderrdae.txt)). I don\'t see any trickles since the beginning of December, were you running off-line? I'm a volunteer and my views are my own. News and Announcements and FAQ ID: 20811 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20813 - Posted: 27 Feb 2006, 20:51:33 UTC - in response to Message 20811. Was there anything useful in the \'messages\' tab? (or if you\'ve restarted since, in the log files? (something like stderrgui.txt, stderrdae.txt)). I don\'t see any trickles since the beginning of December, were you running off-line? No, running on-line all the time as usual. The messages tab looks normal. Says reporpting 1 result 09:39:50 27 feb 2006 (EST). Have not trickled for 615 hours. Figured climate changed. begs the question \"why are the work units so preposterously lage\"? ID: 20813 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20815 - Posted: 27 Feb 2006, 21:12:52 UTC > begs the question \"why are the work units so preposterously lage\"? The FAQ. You are modeling the Earth\'s atmosphere. This is not a trival matter. ID: 20815 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20833 - Posted: 28 Feb 2006, 4:24:11 UTC - in response to Message 20815. > begs the question \"why are the work units so preposterously lage\"? The FAQ. You are modeling the Earth\'s atmosphere. This is not a trival matter. If it were trivial I wouldn\'t be wasting my time with it. But no one has addressed the issue: what happened? ID: 20833 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20834 - Posted: 28 Feb 2006, 5:03:08 UTC I was pointing out that the size of the wus aren\'t \"preposterously large\", just normal for the work being undertaken. As far as anyone can tell from looking at the pages for the 2 models currently shown as still active on your computer, nothing has gone wrong. Which is why you were asked to provide the messages. The two files are stdoutdae.txt, and stderrdae.txt, both in the BOINC folder. Until we have more info, there\'s nothing we can tell you, other than that: sulphur_dm7r_000635319_0 had it\'s first trickled on 01 Dec 2005 03:49:34, and nothing since. If it\'s still running, there should be a large number of trickle_up_... files in the folder of this model. sulphur_dzmm_000652702_0 was sent to you on 2 Dec 2005 12:24:43 UTC, and shows no trickles. The same applies to trickles from this model. ID: 20834 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20883 - Posted: 1 Mar 2006, 4:53:50 UTC - in response to Message 20833. > begs the question \"why are the work units so preposterously lage\"? The FAQ. You are modeling the Earth\'s atmosphere. This is not a trival matter. If it were trivial I wouldn\'t be wasting my time with it. But no one has addressed the issue: what happened? Right. Where would you like me to send the files? ID: 20883 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20921 - Posted: 1 Mar 2006, 19:01:21 UTC There\'s something odd about the dates/times listed on your account, so I\'m looking into it. Re: sending files. If you mean trickles, BOINC does that. If you mean the std... files, I just want a copy and paste of some of the messages at the end. 10-20 lines from stdoutdae.txt should do. ID: 20921 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20935 - Posted: 1 Mar 2006, 21:14:27 UTC - in response to Message 20921. There\'s something odd about the dates/times listed on your account, so I\'m looking into it. Re: sending files. If you mean trickles, BOINC does that. If you mean the std... files, I just want a copy and paste of some of the messages at the end. 10-20 lines from stdoutdae.txt should do. Thank you. Here it is: http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded 2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance 2006-02-28 22:02:25 [LHC@home] Project is down 2006-02-28 22:26:57 [SETI@home] Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 2006-02-28 22:26:57 [SETI@home] Reason: To report results 2006-02-28 22:26:57 [SETI@home] Reporting 2 results 2006-02-28 22:27:02 [SETI@home] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded 2006-02-28 22:35:32 [boincsimap] Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi 2006-02-28 22:35:32 [boincsimap] Reason: To fetch work 2006-02-28 22:35:32 [boincsimap] Requesting 7 seconds of new work, and reporting 1 results 2006-02-28 22:35:37 [boincsimap] Scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi failed with a return value of 500 2006-02-28 22:35:37 [boincsimap] No schedulers responded 2006-02-28 22:36:38 [boincsimap] Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi 2006-02-28 22:36:38 [boincsimap] Reason: To fetch work 2006-02-28 22:36:38 [boincsimap] Requesting 1333 seconds of new work, and reporting 1 results 2006-02-28 22:36:43 [boincsimap] Scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi succeeded 2006-02-28 22:36:45 [boincsimap] Started download of 60301211.025713 2006-02-28 22:36:50 [boincsimap] Finished download of 60301211.025713 2006-02-28 22:36:50 [boincsimap] Throughput 409085 bytes/sec 2006-02-28 22:36:51 [---] request_reschedule_cpus: files downloaded 2006-02-28 22:36:51 [Predictor @ Home] Restarting result h0021B_1_34605_1 using mfoldB125 version 428 2006-02-28 22:36:51 [Einstein@Home] Restarting result z1_0953.5__768_S4R2a_2 using albert version 437 2006-02-28 22:36:51 [SETI@home] Pausing result 26my01aa.6598.18914.61078.1.184_0 (removed from memory) 2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory) 2006-02-28 22:36:52 [---] request_reschedule_cpus: process exited 2006-02-28 23:02:29 [LHC@home] Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi 2006-02-28 23:02:29 [LHC@home] Reason: To fetch work 2006-02-28 23:02:29 [LHC@home] Requesting 17280 seconds of new work 2006-02-28 23:02:34 [LHC@home] Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded 2006-02-28 23:02:34 [LHC@home] Message from server: Server can\'t open database 2006-02-28 23:02:34 [LHC@home] Project is down 2006-02-28 23:09:40 [Predictor @ Home] Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi 2006-02-28 23:09:40 [Predictor @ Home] Reason: To report results 2006-02-28 23:09:40 [Predictor @ Home] Reporting 1 results 2006-02-28 23:09:45 [Predictor @ Home] Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded 2006-02-28 23:15:10 [Predictor @ Home] Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi 2006-02-28 23:15:10 [Predictor @ Home] Reason: To fetch work 2006-02-28 23:15:10 [Predictor @ Home] Requesting 16 seconds of new work 2006-02-28 23:15:15 [Predictor @ Home] Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded 2006-02-28 23:15:17 [Predictor @ Home] Started download of h0021B_1_40053.ini 2006-02-28 23:15:17 [Predictor @ Home] Started download of h0021B_1_40053.inp 2006-02-28 23:15:19 [Predictor @ Home] Finished download of h0021B_1_40053.ini 2006-02-28 23:15:19 [Predictor @ Home] Throughput 6613 bytes/sec 2006-02-28 23:15:19 [Predictor @ Home] Finished download of h0021B_1_40053.inp 2006-02-28 23:15:19 [Predictor @ Home] Throughput 764 bytes/sec 2006-02-28 23:15:19 [Predictor @ Home] Started download of h0021B_1_40053.seq 2006-02-28 23:15:19 [Predictor @ Home] Started download of h0021B_1_40053.res 2006-02-28 23:15:21 [Predictor @ Home] Finished download of h0021B_1_40053.seq 2006-02-28 23:15:21 [Predictor @ Home] Throughput 5561 bytes/sec 2006-02-28 23:15:21 [Predictor @ Home] Finished download of h0021B_1_40053.res 2006-02-28 23:15:21 [Predictor @ Home] Throughput 16 bytes/sec 2006-02-28 23:15:22 [---] request_reschedule_cpus: files downloaded 2006-02-28 23:15:22 [Predictor @ Home] Pausing result h0021B_1_34605_1 (removed from memory) 2006-02-28 23:15:22 [SETI@home] Restarting result 26my01aa.6598.18914.61078.1.184_0 using setiathome version 418 2006-02-28 23:15:23 [---] request_reschedule_cpus: process exited ID: 20935 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20942 - Posted: 1 Mar 2006, 22:14:16 UTC Well that explains the slow progress. Your pc is busy with other projects. I\'m still working on why your message of the 28th isn\'t on the server. ID: 20942 · Reply Quote

MikeMarsUK Volunteer moderator Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0	Message 20946 - Posted: 1 Mar 2006, 23:13:09 UTC - in response to Message 20935. There are no significant errors in the log, are there any entries in there from when your project failed? 2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory) I did notice the \'removed from memory\' though, that\'d be slowing you down (but won\'t be involved in your project\'s failure). ... Thank you. Here it is: http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded 2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance 2006-02-28 22:02:25 [LHC@home] Project is down ... I'm a volunteer and my views are my own. News and Announcements and FAQ ID: 20946 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20947 - Posted: 1 Mar 2006, 23:18:19 UTC - in response to Message 20942. Well that explains the slow progress. Your pc is busy with other projects. I\'m still working on why your message of the 28th isn\'t on the server. Les, Things really have not changed. I have been running lots of projects for quite a while. Climate used to update fairly regularly no problem. It is only since December that the climate project didn\'t trickle at all and bloated to a large proportion. To repeat nothing else has changed as far a BOINC is concerned. One other thing: since this issue started a few days ago I have not received any new work from climate. All other projects proceeding along par for the course. Clay ID: 20947 · Reply Quote

old_user9110 Send message Joined: 2 Sep 04 Posts: 7 Credit: 88,019 RAC: 0	Message 20948 - Posted: 2 Mar 2006, 0:03:06 UTC - in response to Message 20946. There are no significant errors in the log, are there any entries in there from when your project failed? 2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory) I did notice the \'removed from memory\' though, that\'d be slowing you down (but won\'t be involved in your project\'s failure). ... Thank you. Here it is: http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded 2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance 2006-02-28 22:02:25 [LHC@home] Project is down ... This looks like it: 2006-02-26 08:48:09 [LHC@home] No work from project 2006-02-26 08:49:14 [LHC@home] No work from project 2006-02-26 08:51:44 [LHC@home] No work from project 2006-02-26 18:24:02 [SZTAKI Desktop Grid] Project is down 2006-02-27 09:30:12 [climateprediction.net] Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502 (0xc0000142)) 2006-02-27 09:30:24 [Einstein@Home] Unrecoverable error for result z1_0953.5__781_S4R2a_2 ( - exit code -1073741502 (0xc0000142)) 2006-02-27 18:41:44 [LHC@home] No work from project 2006-02-27 18:42:44 [LHC@home] Fetching master file 2006-02-27 18:43:00 [LHC@home] No work from project 2006-02-27 18:44:05 [LHC@home] No work from project 2006-02-27 18:45:11 [LHC@home] No wo and 2006-02-27 06:20:08 [SETI@home] Reason: To report results 2006-02-27 06:20:08 [SETI@home] Reporting 1 results 2006-02-27 06:20:13 [SETI@home] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded 2006-02-27 09:30:11 [---] request_reschedule_cpus: process exited 2006-02-27 09:30:11 [Einstein@Home] Computation for result z1_0953.5__790_S4R2a_1 finished 2006-02-27 09:30:11 [climateprediction.net] Restarting result 3puv_100195221_1 using hadsm3 version 413 2006-02-27 09:30:12 [climateprediction.net] Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502 (0xc0000142)) 2006-02-27 09:30:12 [---] request_reschedule_cpus: process exited 2006-02-27 09:30:12 [climateprediction.net] Computation for result 3puv_100195221_1 finished 2006-02-27 09:30:12 [---] Allowing work fetch again. 2006-02-27 09:30:13 [Einstein@Home] Started upload of z1_0953.5__790_S4R2a_1_0 2006-02-27 09:30:16 [Einstein@Home] Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi 2006-02-27 09:30:16 [Einstein@Home] Reason: To fetch work 2006-02-27 09:30:16 [Einstein@Home] Requesting 17280 seconds of new work 2006-02-27 09:30:21 [Einstein@Home] Finished upload of z1_0953.5__790_S4R2a_1_0 2006-02-27 09:30:21 [Einstein@Home] Throughput 39405 bytes/sec 2006-02-27 09:30:21 [Einstein@Home] Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded 2006-02-27 09:30:23 [---] request_reschedule_cpus: files downloaded 2006-02-27 09:30:23 [---] Suspending work fetch because computer is overcommitted. 2006-02-27 09:30:23 [Einstein@Home] Starting result z1_0953.5__781_S4R2a_2 using albert version 437 2006-02-27 09:30:24 [Einstein@Home] Unrecoverable error for result z1_0953.5__781_S4R2a_2 ( - exit code -1073741502 (0xc0000142)) 2006-02-27 09:30:24 [---] request_reschedule_cpus: process exited 2006-02-27 09:30:24 [Einstein@Home] Computation for result z1_0953.5__781_S4R2a_2 finished 2006-02-27 09:30:24 [---] Allowing work fetch again. 2006-02-27 09:30:26 [SETI@home] Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 2006-02-27 09:30:26 [SETI@home] Reason: To fetch work 2006-02-27 09:30:26 [SETI@home] Requesting 17280 seconds of new work 2006-02-27 09:30:28 [---] Exit requested by user 2006-02-27 09:30:29 [---] request_reschedule_cpus: exit_tasks Thanls, Clay ID: 20948 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 20962 - Posted: 2 Mar 2006, 8:35:38 UTC I don\'t have all the answers. And possibly not all the questions. However here are a few fragments. From your messages: > Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502 The error code is rather obscure, but may have something to do with a Microsoft conflict. Upgrading to the latest drivers for the video card has helped a few people. Also, model 3puv_100195221_1 was issued to you on 15 Nov 2005 and last trickled on 12 Jan 2006 08:19:06 There were 49 trickles, just into the start of phase 3, and the phase 1 and 2 graphs are on the page for that model. It is Result ID 1191626 if you want to look at it. What it has been doing between the trickle on 12th Jan, and the crash on 27th Feb may never be known. You have two other models currently allotted to your computer: sulphur_dm7r_000635319_0, which is Result ID 1310656, issued 29 Nov 2005, with one trickle on 01 Dec 2005 03:49:34, and sulphur_dzmm_000652702_0, which is Result ID 1328152, issued 2 Dec 2005, with no trickles. Anything else is a bit of a mystery, and I don\'t know what to advise. I only run cpdn, so I don\'t have the problem of BOINC trying to juggle the time share allocation against all the deadlines, etc. I think what I would do in your position, is set all projects to \'No new work\', suspend any climate models shown, and let all the others run down. Then I\'d unsuspend the climate models and try to get them working, perhaps with Update. Once they seemed to be working, with a few days of trickles, I\'d try for some of the other projects. The only other suggestion is Good Luck. You may need it. ID: 20962 · Reply Quote