climateprediction.net (CPDN) home page
Thread 'Result exited with zero status?'

Thread 'Result exited with zero status?'

Message boards : Number crunching : Result exited with zero status?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 5216 - Posted: 11 Oct 2004, 5:49:04 UTC
Last modified: 11 Oct 2004, 5:57:09 UTC

Thanks to LHC having no work, this box is now only doing CPDN. Just yesterday I finally got to phase 2. Today I noticed some strange error messages that I have seen before with seti (or was it lhc?) but never before with CPDN. Anyone else seen this?

================================================
climateprediction.net - 2004-10-10 21:11:58 - Result 2vqa_000155790_0 exited with zero status but no 'finished' file
climateprediction.net - 2004-10-10 21:11:58 - If this happens repeatedly you may need to reset the project.
climateprediction.net - 2004-10-10 21:11:58 - Restarting result 2vqa_000155790_0 using hadsm3 version 4.03
climateprediction.net - 2004-10-10 21:30:01 - Result 2vqa_000155790_0 exited with zero status but no 'finished' file
climateprediction.net - 2004-10-10 21:30:01 - If this happens repeatedly you may need to reset the project.
climateprediction.net - 2004-10-10 21:30:01 - Restarting result 2vqa_000155790_0 using hadsm3 version 4.03
================================================

The model appears to be still crunching away and making progress (at 27,000 in phase 2 so far) but this does not look promising. I have seen it at least 5 or 6 times now. May have missed a couple too.

AMD 2400+ with a gig of RAM
attached to LHC and CPDN
BOINC v4.09
<br>
----------------------------
A member of <a href="team_display.php?teamid=45">The Knights Who Say Ni!</a>
<a href="http://boinc-kwsn.no-ip.info">My BOINC stats site</a>
ID: 5216 · Report as offensive     Reply Quote
old_user760
Avatar

Send message
Joined: 10 Aug 04
Posts: 94
Credit: 309,849
RAC: 0
Message 5217 - Posted: 11 Oct 2004, 7:18:50 UTC

I had to junk one WU for the same reason. Got the same message.

<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=35&amp;trans=off"><a href="http://mysite.wanadoo-members.co.uk/thefinalfrontear/index.html"> Team Site Link</a>"The world is a progressively realized community of interpretation."
ID: 5217 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 5219 - Posted: 11 Oct 2004, 7:51:42 UTC

hmm... Well that isn't encouraging. Since it is still making progress (now up to 29,000 TS) I'm hoping I won't have to reset. I have already lost 3 models due to CPU/motherboard problems. I really want to finish one for once! :)
<br>
----------------------------
A member of <a href="team_display.php?teamid=45">The Knights Who Say Ni!</a>
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 5219 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 5223 - Posted: 11 Oct 2004, 9:20:15 UTC - in response to Message 5216.  
Last modified: 11 Oct 2004, 9:24:37 UTC

The error is generated whenever BOINC detects an abnormal termination of a project program (when a program terminates normally it should create a boinc_finish_called file in its slots directory). Is there anything in the stdout or stderr files in the BOINC or climateprediction.net/{result_id} directories to indicate what caused the CPDN model to stop running?
<br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a>
ID: 5223 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 5243 - Posted: 11 Oct 2004, 16:51:49 UTC

The only thing in the boinc stderr.txt is some DNS errors thanks to major network problems my ISP had yesterday (down for about 9 hours). stderr_um.txt in projects\climateprediction.net\2vqa_000155790 has a bunch of lines that read
"OPEN: File dataout/2vqaba.da27bs0 Created on Unit 22"
Then there are these:

CLOSE: WARNING: Unit 60 Not Opened
OPEN: File dataout/2vqaba.pa28c10 Created on Unit 60
CLOSE: WARNING: Unit 63 Not Opened
OPEN: File dataout/2vqaba.pd28c10 Created on Unit 63

and a few more like it. That is the only oddity I see anywhere. Up to TS 40,000 and I don't think the error came up at all last night. Very random it seems.
<br>
----------------------------
A member of <a href="team_display.php?teamid=45">The Knights Who Say Ni!</a>
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 5243 · Report as offensive     Reply Quote
old_user355

Send message
Joined: 7 Aug 04
Posts: 187
Credit: 44,163
RAC: 0
Message 5246 - Posted: 11 Oct 2004, 18:20:17 UTC

I see the "no finished file" error sometimes, but the WU always continues processing.

<a href="http://www.boinc.dk/index.php?page=user_statistics&amp;project=cpdn&amp;userid=355"><img border="0" height="80" src="http://355.cpdn.sig.boinc.dk?188"></a>
ID: 5246 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 5248 - Posted: 11 Oct 2004, 18:40:25 UTC

Hi, Toby,

I had similar experience to Heffed on one machine early in Phase 1 of the current pair of Models. Since then, the machine seems to behave itself.

I don't recall which machine, whether M$/XP or Linux, but all eight of my Models (four HT machines) are in Phase 2 or 3. So, I think the odds are in your favor for a successful run. (My fingers are crossed for all of us!)


We have met the enemy and he is us -- Pogo
ID: 5248 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 5252 - Posted: 11 Oct 2004, 19:46:20 UTC - in response to Message 5248.  

&gt; (My fingers are crossed for all of us!)

Sweet! I kind of need my fingers uncrossed for work so thanks for doing that for us. Maybe I'll knock on wood instead :)

But it is good to hear that others have seen this and that it doesn't seem to be detrimental to the project.

This is happening on my windowx XP SP1 (haven't had the balls to try SP2 yet :) box although I have seen the same error on my gentoo linux box (2.6.8) while running seti.
<br>
----------------------------
A member of <a href="team_display.php?teamid=45">The Knights Who Say Ni!</a>
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 5252 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 5268 - Posted: 12 Oct 2004, 8:17:51 UTC - in response to Message 5243.  

&gt; The only thing in the boinc stderr.txt is some DNS errors thanks to major
&gt; network problems my ISP had yesterday (down for about 9 hours). stderr_um.txt
&gt; in projectsclimateprediction.net2vqa_000155790 has a bunch of lines that
&gt; read
&gt; "OPEN: File dataout/2vqaba.da27bs0 Created on Unit 22"
&gt; Then there are these:
&gt;
&gt; CLOSE: WARNING: Unit 60 Not Opened
&gt; OPEN: File dataout/2vqaba.pa28c10 Created on Unit 60
&gt; CLOSE: WARNING: Unit 63 Not Opened
&gt; OPEN: File dataout/2vqaba.pd28c10 Created on Unit 63

There's nothing there to worry about, Toby. It's just a warning that hadsm3um is trying to close a file that it hasn't created yet.
<br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a>
ID: 5268 · Report as offensive     Reply Quote
old_user28601

Send message
Joined: 5 Nov 04
Posts: 19
Credit: 88,724
RAC: 0
Message 6203 - Posted: 18 Nov 2004, 7:47:24 UTC

I'm having the same error on all my machines... :-(

-
2004-11-17 12:32:52 [climateprediction.net] Result 3ev9_100180838_0 exited with zero status but no 'finished' file
2004-11-17 12:32:52 [climateprediction.net] If this happens repeatedly you may need to reset the project.
2004-11-17 12:32:52 [climateprediction.net] Restarting result 3ev9_100180838_0 using hadsm3 version 4.04
2004-11-17 14:52:43 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2004-11-17 14:52:46 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
2004-11-17 17:26:49 [climateprediction.net] Result 3ev9_100180838_0 exited with zero status but no 'finished' file
2004-11-17 17:26:49 [climateprediction.net] If this happens repeatedly you may need to reset the project.
2004-11-17 17:26:49 [climateprediction.net] Restarting result 3ev9_100180838_0 using hadsm3 version 4.04
2004-11-17 22:50:15 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2004-11-17 22:50:19 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
2004-11-18 06:21:31 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2004-11-18 06:21:34 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
-
This is the log from my first machine. The error has occured repeatedly (4 times) in the last couple of days.
My second machine is running 4 units at the same time (dual Xeon with hyperthreading). Yesterday, it gave the same error for all units it is working on (after having spent 140 hours on each unit).

On both machines, the stderr.txt is empty...

Any suggestions?


Jörg
ID: 6203 · Report as offensive     Reply Quote
old_user29560

Send message
Joined: 12 Nov 04
Posts: 3
Credit: 3,374
RAC: 0
Message 6235 - Posted: 19 Nov 2004, 15:37:35 UTC

I've seen it happen with my Seti WU's, it's just a BOINC thing i believe, and only happens when you start it up from it being shutdown. All i did was suspend the work, then changed it back to run.
ID: 6235 · Report as offensive     Reply Quote

Message boards : Number crunching : Result exited with zero status?

©2024 cpdn.org