climateprediction.net (CPDN) home page
Thread '615 hours of work disapeared'

Thread '615 hours of work disapeared'

Questions and Answers : Windows : 615 hours of work disapeared
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20809 - Posted: 27 Feb 2006, 18:18:50 UTC

Today after 615 hours of work and some 500 left to go climate uploaded result, the remaining 500 hours went puff on my system and I can find no result for the upload on the results page.

What happened?


ID: 20809 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 20811 - Posted: 27 Feb 2006, 19:18:58 UTC
Last modified: 27 Feb 2006, 19:31:28 UTC

Was there anything useful in the \'messages\' tab? (or if you\'ve restarted since, in the log files? (something like stderrgui.txt, stderrdae.txt)).

I don\'t see any trickles since the beginning of December, were you running off-line?
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 20811 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20813 - Posted: 27 Feb 2006, 20:51:33 UTC - in response to Message 20811.  

Was there anything useful in the \'messages\' tab? (or if you\'ve restarted since, in the log files? (something like stderrgui.txt, stderrdae.txt)).

I don\'t see any trickles since the beginning of December, were you running off-line?




No, running on-line all the time as usual. The messages tab looks normal. Says reporpting 1 result 09:39:50 27 feb 2006 (EST).

Have not trickled for 615 hours. Figured climate changed.
begs the question \"why are the work units so preposterously lage\"?
ID: 20813 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20815 - Posted: 27 Feb 2006, 21:12:52 UTC

> begs the question \"why are the work units so preposterously lage\"?

The FAQ.
You are modeling the Earth\'s atmosphere. This is not a trival matter.

ID: 20815 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20833 - Posted: 28 Feb 2006, 4:24:11 UTC - in response to Message 20815.  

> begs the question \"why are the work units so preposterously lage\"?

The FAQ.
You are modeling the Earth\'s atmosphere. This is not a trival matter.



If it were trivial I wouldn\'t be wasting my time with it. But no one has addressed the issue: what happened?
ID: 20833 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20834 - Posted: 28 Feb 2006, 5:03:08 UTC

I was pointing out that the size of the wus aren\'t \"preposterously large\", just normal for the work being undertaken.

As far as anyone can tell from looking at the pages for the 2 models currently shown as still active on your computer, nothing has gone wrong. Which is why you were asked to provide the messages.

The two files are stdoutdae.txt, and stderrdae.txt, both in the BOINC folder.
Until we have more info, there\'s nothing we can tell you, other than that:

sulphur_dm7r_000635319_0 had it\'s first trickled on 01 Dec 2005 03:49:34, and nothing since. If it\'s still running, there should be a large number of trickle_up_... files in the folder of this model.

sulphur_dzmm_000652702_0 was sent to you on 2 Dec 2005 12:24:43 UTC, and shows no trickles. The same applies to trickles from this model.

ID: 20834 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20883 - Posted: 1 Mar 2006, 4:53:50 UTC - in response to Message 20833.  

> begs the question \"why are the work units so preposterously lage\"?

The FAQ.
You are modeling the Earth\'s atmosphere. This is not a trival matter.



If it were trivial I wouldn\'t be wasting my time with it. But no one has addressed the issue: what happened?



Right. Where would you like me to send the files?
ID: 20883 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20921 - Posted: 1 Mar 2006, 19:01:21 UTC

There\'s something odd about the dates/times listed on your account, so I\'m looking into it.

Re: sending files. If you mean trickles, BOINC does that. If you mean the std... files, I just want a copy and paste of some of the messages at the end. 10-20 lines from stdoutdae.txt should do.

ID: 20921 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20935 - Posted: 1 Mar 2006, 21:14:27 UTC - in response to Message 20921.  

There\'s something odd about the dates/times listed on your account, so I\'m looking into it.

Re: sending files. If you mean trickles, BOINC does that. If you mean the std... files, I just want a copy and paste of some of the messages at the end. 10-20 lines from stdoutdae.txt should do.



Thank you. Here it is:

http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance
2006-02-28 22:02:25 [LHC@home] Project is down
2006-02-28 22:26:57 [SETI@home] Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
2006-02-28 22:26:57 [SETI@home] Reason: To report results
2006-02-28 22:26:57 [SETI@home] Reporting 2 results
2006-02-28 22:27:02 [SETI@home] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
2006-02-28 22:35:32 [boincsimap] Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi
2006-02-28 22:35:32 [boincsimap] Reason: To fetch work
2006-02-28 22:35:32 [boincsimap] Requesting 7 seconds of new work, and reporting 1 results
2006-02-28 22:35:37 [boincsimap] Scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi failed with a return value of 500
2006-02-28 22:35:37 [boincsimap] No schedulers responded
2006-02-28 22:36:38 [boincsimap] Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi
2006-02-28 22:36:38 [boincsimap] Reason: To fetch work
2006-02-28 22:36:38 [boincsimap] Requesting 1333 seconds of new work, and reporting 1 results
2006-02-28 22:36:43 [boincsimap] Scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi succeeded
2006-02-28 22:36:45 [boincsimap] Started download of 60301211.025713
2006-02-28 22:36:50 [boincsimap] Finished download of 60301211.025713
2006-02-28 22:36:50 [boincsimap] Throughput 409085 bytes/sec
2006-02-28 22:36:51 [---] request_reschedule_cpus: files downloaded
2006-02-28 22:36:51 [Predictor @ Home] Restarting result h0021B_1_34605_1 using mfoldB125 version 428
2006-02-28 22:36:51 [Einstein@Home] Restarting result z1_0953.5__768_S4R2a_2 using albert version 437
2006-02-28 22:36:51 [SETI@home] Pausing result 26my01aa.6598.18914.61078.1.184_0 (removed from memory)
2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory)
2006-02-28 22:36:52 [---] request_reschedule_cpus: process exited
2006-02-28 23:02:29 [LHC@home] Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi
2006-02-28 23:02:29 [LHC@home] Reason: To fetch work
2006-02-28 23:02:29 [LHC@home] Requesting 17280 seconds of new work
2006-02-28 23:02:34 [LHC@home] Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2006-02-28 23:02:34 [LHC@home] Message from server: Server can\'t open database
2006-02-28 23:02:34 [LHC@home] Project is down
2006-02-28 23:09:40 [Predictor @ Home] Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
2006-02-28 23:09:40 [Predictor @ Home] Reason: To report results
2006-02-28 23:09:40 [Predictor @ Home] Reporting 1 results
2006-02-28 23:09:45 [Predictor @ Home] Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
2006-02-28 23:15:10 [Predictor @ Home] Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
2006-02-28 23:15:10 [Predictor @ Home] Reason: To fetch work
2006-02-28 23:15:10 [Predictor @ Home] Requesting 16 seconds of new work
2006-02-28 23:15:15 [Predictor @ Home] Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded
2006-02-28 23:15:17 [Predictor @ Home] Started download of h0021B_1_40053.ini
2006-02-28 23:15:17 [Predictor @ Home] Started download of h0021B_1_40053.inp
2006-02-28 23:15:19 [Predictor @ Home] Finished download of h0021B_1_40053.ini
2006-02-28 23:15:19 [Predictor @ Home] Throughput 6613 bytes/sec
2006-02-28 23:15:19 [Predictor @ Home] Finished download of h0021B_1_40053.inp
2006-02-28 23:15:19 [Predictor @ Home] Throughput 764 bytes/sec
2006-02-28 23:15:19 [Predictor @ Home] Started download of h0021B_1_40053.seq
2006-02-28 23:15:19 [Predictor @ Home] Started download of h0021B_1_40053.res
2006-02-28 23:15:21 [Predictor @ Home] Finished download of h0021B_1_40053.seq
2006-02-28 23:15:21 [Predictor @ Home] Throughput 5561 bytes/sec
2006-02-28 23:15:21 [Predictor @ Home] Finished download of h0021B_1_40053.res
2006-02-28 23:15:21 [Predictor @ Home] Throughput 16 bytes/sec
2006-02-28 23:15:22 [---] request_reschedule_cpus: files downloaded
2006-02-28 23:15:22 [Predictor @ Home] Pausing result h0021B_1_34605_1 (removed from memory)
2006-02-28 23:15:22 [SETI@home] Restarting result 26my01aa.6598.18914.61078.1.184_0 using setiathome version 418
2006-02-28 23:15:23 [---] request_reschedule_cpus: process exited
ID: 20935 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20942 - Posted: 1 Mar 2006, 22:14:16 UTC

Well that explains the slow progress. Your pc is busy with other projects.
I\'m still working on why your message of the 28th isn\'t on the server.

ID: 20942 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 20946 - Posted: 1 Mar 2006, 23:13:09 UTC - in response to Message 20935.  

There are no significant errors in the log, are there any entries in there from when your project failed?

2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory)

I did notice the \'removed from memory\' though, that\'d be slowing you down (but won\'t be involved in your project\'s failure).

...
Thank you. Here it is:

http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance
2006-02-28 22:02:25 [LHC@home] Project is down
...


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 20946 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20947 - Posted: 1 Mar 2006, 23:18:19 UTC - in response to Message 20942.  

Well that explains the slow progress. Your pc is busy with other projects.
I\'m still working on why your message of the 28th isn\'t on the server.



Les,

Things really have not changed. I have been running lots of projects for quite a while. Climate used to update fairly regularly no problem. It is only since December that the climate project didn\'t trickle at all and bloated to a large proportion. To repeat nothing else has changed as far a BOINC is concerned.

One other thing: since this issue started a few days ago I have not received any new work from climate. All other projects proceeding along par for the course.

Clay
ID: 20947 · Report as offensive     Reply Quote
old_user9110

Send message
Joined: 2 Sep 04
Posts: 7
Credit: 88,019
RAC: 0
Message 20948 - Posted: 2 Mar 2006, 0:03:06 UTC - in response to Message 20946.  

There are no significant errors in the log, are there any entries in there from when your project failed?

2006-02-28 22:36:51 [boincsimap] Pausing result 60301211.023530_0 (removed from memory)

I did notice the \'removed from memory\' though, that\'d be slowing you down (but won\'t be involved in your project\'s failure).

...
Thank you. Here it is:

http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded
2006-02-28 22:02:25 [LHC@home] Message from server: Project is temporarily shut down for maintenance
2006-02-28 22:02:25 [LHC@home] Project is down
...





This looks like it:
2006-02-26 08:48:09 [LHC@home] No work from project
2006-02-26 08:49:14 [LHC@home] No work from project
2006-02-26 08:51:44 [LHC@home] No work from project
2006-02-26 18:24:02 [SZTAKI Desktop Grid] Project is down
2006-02-27 09:30:12 [climateprediction.net] Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502 (0xc0000142))
2006-02-27 09:30:24 [Einstein@Home] Unrecoverable error for result z1_0953.5__781_S4R2a_2 ( - exit code -1073741502 (0xc0000142))
2006-02-27 18:41:44 [LHC@home] No work from project
2006-02-27 18:42:44 [LHC@home] Fetching master file
2006-02-27 18:43:00 [LHC@home] No work from project
2006-02-27 18:44:05 [LHC@home] No work from project
2006-02-27 18:45:11 [LHC@home] No wo

and

2006-02-27 06:20:08 [SETI@home] Reason: To report results
2006-02-27 06:20:08 [SETI@home] Reporting 1 results
2006-02-27 06:20:13 [SETI@home] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
2006-02-27 09:30:11 [---] request_reschedule_cpus: process exited
2006-02-27 09:30:11 [Einstein@Home] Computation for result z1_0953.5__790_S4R2a_1 finished
2006-02-27 09:30:11 [climateprediction.net] Restarting result 3puv_100195221_1 using hadsm3 version 413
2006-02-27 09:30:12 [climateprediction.net] Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502 (0xc0000142))
2006-02-27 09:30:12 [---] request_reschedule_cpus: process exited
2006-02-27 09:30:12 [climateprediction.net] Computation for result 3puv_100195221_1 finished
2006-02-27 09:30:12 [---] Allowing work fetch again.
2006-02-27 09:30:13 [Einstein@Home] Started upload of z1_0953.5__790_S4R2a_1_0
2006-02-27 09:30:16 [Einstein@Home] Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
2006-02-27 09:30:16 [Einstein@Home] Reason: To fetch work
2006-02-27 09:30:16 [Einstein@Home] Requesting 17280 seconds of new work
2006-02-27 09:30:21 [Einstein@Home] Finished upload of z1_0953.5__790_S4R2a_1_0
2006-02-27 09:30:21 [Einstein@Home] Throughput 39405 bytes/sec
2006-02-27 09:30:21 [Einstein@Home] Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
2006-02-27 09:30:23 [---] request_reschedule_cpus: files downloaded
2006-02-27 09:30:23 [---] Suspending work fetch because computer is overcommitted.
2006-02-27 09:30:23 [Einstein@Home] Starting result z1_0953.5__781_S4R2a_2 using albert version 437
2006-02-27 09:30:24 [Einstein@Home] Unrecoverable error for result z1_0953.5__781_S4R2a_2 ( - exit code -1073741502 (0xc0000142))
2006-02-27 09:30:24 [---] request_reschedule_cpus: process exited
2006-02-27 09:30:24 [Einstein@Home] Computation for result z1_0953.5__781_S4R2a_2 finished
2006-02-27 09:30:24 [---] Allowing work fetch again.
2006-02-27 09:30:26 [SETI@home] Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
2006-02-27 09:30:26 [SETI@home] Reason: To fetch work
2006-02-27 09:30:26 [SETI@home] Requesting 17280 seconds of new work
2006-02-27 09:30:28 [---] Exit requested by user
2006-02-27 09:30:29 [---] request_reschedule_cpus: exit_tasks

Thanls,

Clay
ID: 20948 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20962 - Posted: 2 Mar 2006, 8:35:38 UTC

I don\'t have all the answers. And possibly not all the questions.
However here are a few fragments.

From your messages:
> Unrecoverable error for result 3puv_100195221_1 ( - exit code -1073741502

The error code is rather obscure, but may have something to do with a Microsoft conflict. Upgrading to the latest drivers for the video card has helped a few people.

Also, model 3puv_100195221_1 was issued to you on 15 Nov 2005 and last trickled on 12 Jan 2006 08:19:06
There were 49 trickles, just into the start of phase 3, and the phase 1 and 2 graphs are on the page for that model. It is Result ID 1191626 if you want to look at it.

What it has been doing between the trickle on 12th Jan, and the crash on 27th Feb may never be known.

You have two other models currently allotted to your computer:
sulphur_dm7r_000635319_0, which is Result ID 1310656, issued 29 Nov 2005, with one trickle on 01 Dec 2005 03:49:34,
and sulphur_dzmm_000652702_0, which is Result ID 1328152, issued 2 Dec 2005, with no trickles.

Anything else is a bit of a mystery, and I don\'t know what to advise.
I only run cpdn, so I don\'t have the problem of BOINC trying to juggle the time share allocation against all the deadlines, etc.

I think what I would do in your position, is set all projects to \'No new work\', suspend any climate models shown, and let all the others run down.
Then I\'d unsuspend the climate models and try to get them working, perhaps with Update. Once they seemed to be working, with a few days of trickles, I\'d try for some of the other projects.

The only other suggestion is Good Luck. You may need it.

ID: 20962 · Report as offensive     Reply Quote

Questions and Answers : Windows : 615 hours of work disapeared

©2025 cpdn.org