climateprediction.net (CPDN) home page
Thread 'Exited with zero status- reason to worry?'

Thread 'Exited with zero status- reason to worry?'

Message boards : Number crunching : Exited with zero status- reason to worry?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25840 - Posted: 4 Jan 2007, 21:06:54 UTC

Hi guys,

I\'m getting repeated \"exited with zero status\" messages for my model. The first one, which happened this afternoon (CET) is probably due to the fact that my computer synchronized its clock and went back three minutes- I remember the model reacting the same to daylight savings, without any further problems. But I\'ve been getting three more of those messages since then. Do I have to worry? I do have backups, but as the newest one is a couple of days old I don\'t want to reset if I don\'t have to. The model seems to have trickled normally despite the error messages.
ID: 25840 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25841 - Posted: 4 Jan 2007, 21:37:05 UTC

You can do one of two things:

1) Follow the instructions, and lose the model.

2) Leave it alone and maybe lose the model. And maybe not.

The message is an advisory, not a warning, and is a left over from a very early version of BOINC, which is now being triggered by Windows for whatever reason. Or perhaps no reason at all.

And you can always make another backup RIGHT NOW.

ID: 25841 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25842 - Posted: 4 Jan 2007, 21:48:30 UTC
Last modified: 4 Jan 2007, 21:48:47 UTC

Okay thanks a lot. This means I\'ll go into the normal ignore mode for flaming Windoze apps ;-) and hope for the best. Maybe changing system time can trigger multiple messages of that kind instead of just one...
ID: 25842 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 25948 - Posted: 10 Jan 2007, 21:31:24 UTC

I have the same zero status msg and the time for completion is now ticking up !!
What to do?

Thanks
DP

1/10/2007 3:21:50 PM|climateprediction.net|Task hadcm3lbm_91jc_25191163_0 exited with zero status but no \'finished\' file
1/10/2007 3:21:50 PM|climateprediction.net|If this happens repeatedly you may need to reset the project.
1/10/2007 3:21:50 PM||Rescheduling CPU: application exited
1/10/2007 3:21:50 PM|climateprediction.net|Restarting task hadcm3lbm_91jc_25191163_0 using hadcm3lb version 508
1/10/2007 3:31:36 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/10/2007 3:31:36 PM|climateprediction.net|Reason: To send trickle-up message
1/10/2007 3:31:36 PM|climateprediction.net|(not requesting new work or reporting completed tasks)
1/10/2007 3:31:45 PM|climateprediction.net|Scheduler request succeeded
1/10/2007 3:43:50 PM|climateprediction.net|Task hadcm3lbm_91jc_25191163_0 exited with zero status but no \'finished\' file
1/10/2007 3:43:50 PM|climateprediction.net|If this happens repeatedly you may need to reset the project.
1/10/2007 3:43:50 PM||Rescheduling CPU: application exited
1/10/2007 3:43:50 PM|climateprediction.net|Restarting task hadcm3lbm_91jc_25191163_0 using hadcm3lb version 508
1/10/2007 3:43:51 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/10/2007 3:43:51 PM|climateprediction.net|Reason: To send trickle-up message
1/10/2007 3:43:51 PM|climateprediction.net|(not requesting new work or reporting completed tasks)
1/10/2007 3:43:56 PM|climateprediction.net|Scheduler request succeeded
1/10/2007 3:49:51 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/10/2007 3:49:51 PM|climateprediction.net|Reason: To send trickle-up message
1/10/2007 3:49:51 PM|climateprediction.net|(not requesting new work or reporting completed tasks)
1/10/2007 3:49:56 PM|climateprediction.net|Scheduler request succeeded

ID: 25948 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25949 - Posted: 10 Jan 2007, 22:12:26 UTC

Make a backup right now, and then keep your fingers crossed.
It\'ll either crash, or it won\'t crash.

ID: 25949 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25962 - Posted: 11 Jan 2007, 21:29:26 UTC
Last modified: 11 Jan 2007, 21:32:04 UTC

Just to get back: Mine didn\'t. Has been running like a dream ever since. dp, do you synchronize your clock over the Internet? It\'s standard in Win XP I think, if you\'re using that (like most people). From my experiences that is really likely to trigger this error message- small cause, big effect. Just a guess of course, but I have experience that makes it seem quite probable to me.

EDIT: Yep I just checked ;-) XP on both boxes... you can disable the \"sync clock\" feature by using XP AntiSpy, if you think it is the cause... I didn\'t, cause my system time tends to go completely wrong if I do, and besides, the error messages never did any serious harm to my model.

ID: 25962 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 25965 - Posted: 11 Jan 2007, 22:23:43 UTC - in response to Message 25962.  


EDIT: Yep I just checked ;-) XP on both boxes... you can disable the \"sync clock\" feature by using XP AntiSpy, if you think it is the cause... I didn\'t, cause my system time tends to go completely wrong if I do, and besides, the error messages never did any serious harm to my model.

[/quote]
Thanks I\'ll try it
DP
ID: 25965 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 25981 - Posted: 13 Jan 2007, 0:02:48 UTC

So Am i wasting time and resources?

I have another host (Centrino XP)that seems to work fine, On this athlon host I shut off the clock sync and started CPDN again and this is the result

Thanks in advance
DP
1/12/2007 2:30:12 PM|climateprediction.net|Resuming task hadcm3lbm_91jc_25191163_0 using hadcm3lb version 508
1/12/2007 4:37:30 PM||Rescheduling CPU: application exited
1/12/2007 4:37:30 PM|climateprediction.net|Computation for task hadcm3lbm_91jc_25191163_0 finished
1/12/2007 4:37:30 PM|rosetta@home|Resuming task 1fvv_1_NMRREF_1_1fvv_1_yyidrenum_14IGNORE_THE_REST_0001_1475_9479_0 using rosetta version 543
1/12/2007 4:37:31 PM|climateprediction.net|Unrecoverable error for result hadcm3lbm_91jc_25191163_0 (<file_xfer_error> <file_name>hadcm3lbm_91jc_25191163_0_15.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3lbm_91jc_25191163_0_16.zip</file_name> <error_code>-161</error_code></file_xfer_error>)
1/12/2007 4:37:31 PM|climateprediction.net|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 4:38:31 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/12/2007 4:38:31 PM|climateprediction.net|Reason: To fetch work
1/12/2007 4:38:31 PM|climateprediction.net|Requesting 8640 seconds of new work, and reporting 1 completed tasks
1/12/2007 4:38:36 PM|climateprediction.net|Scheduler request succeeded
1/12/2007 4:38:36 PM|climateprediction.net|No work from project
1/12/2007 4:38:36 PM|climateprediction.net|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 4:39:36 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/12/2007 4:39:36 PM|climateprediction.net|Reason: To fetch work
1/12/2007 4:39:36 PM|climateprediction.net|Requesting 8640 seconds of new work
1/12/2007 4:39:41 PM|climateprediction.net|Scheduler request succeeded
1/12/2007 4:39:41 PM|climateprediction.net|No work from project
1/12/2007 4:39:41 PM|climateprediction.net|Deferring scheduler requests for 1 minutes and 0 seconds
1/12/2007 4:40:42 PM|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
1/12/2007 4:40:42 PM|climateprediction.net|Reason: To fetch work
1/12/2007 4:40:42 PM|climateprediction.net|Requesting 8640 seconds of new work
1/12/2007 4:40:47 PM|climateprediction.net|Scheduler request succeeded
1/12/2007 4:40:47 PM|climateprediction.net|No work from project


ID: 25981 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 25985 - Posted: 13 Jan 2007, 0:51:58 UTC

Well an \"unrecoverable error\" sounds definitely much worse. I fear in your case the model wasn\'t just having an allergic reaction to time sync :-( might have been but now it looks to me like you lost it...
ID: 25985 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 25986 - Posted: 13 Jan 2007, 2:03:38 UTC


... unless you have a full backup of the boinc folder.

There are no other error messages logged in the Model\'s record, such as Negative Pressure or Negative Theta. So, it should be a good candidate to get through on the backup copy.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 25986 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 25988 - Posted: 13 Jan 2007, 2:54:01 UTC

FWIW
I noticed the last succesful upload was 1800 on 31 dec 2006 then nothing after the new year.???

I did make a bu of the cpdn directory 2 days ago and will try to restore it.

Seems a shame to lose a model after 2500hrs of cpu time

DP
ID: 25988 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25989 - Posted: 13 Jan 2007, 5:48:25 UTC

There are no guarantees that say a model will always complete the full 160 years.
A parameter set may go wonky at any point in the model. This includes at 161 years, or 200 years, etc. Just as the real weather can suddenly \'turn nasty\', and produce a tornado in the middle of a nice day, while people are out having a picnic.

Lots have been lost in the last few years of the modelling, but the info is still useful.

ID: 25989 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 25998 - Posted: 13 Jan 2007, 12:37:20 UTC - in response to Message 25989.  

Lots have been lost in the last few years of the modelling, but the info is still useful.

Thanks for the encouragement
I\'ll start new one

DP
ID: 25998 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,714,904
RAC: 8,478
Message 26019 - Posted: 14 Jan 2007, 10:23:59 UTC

I got the \'No heartbeat from core client - exiting\' message on a brand new machine, and traced it to the clock synchronisation - the clock had jumped forward by about two minutes, so of course that was longer than the 30 seconds that BOINC is prepared to wait. But it re-started automatically, and I didn\'t lose any work on any of the projects I had running.

You can get an idea of how good your computer is at keeping time by looking at the \'Internet time\' tab in the Date and Time Properties control panel (double-click on the system clock to open it). The \'Next Synchronisation\' line tells you what the clock said before it was changed: the \'successful synchronisation\' line tells you what it changed to after synchronisation. If it\'s jumped forward, then you may have this problem.

Because mine drifts by about two minutes per week, I\'m thinking of installing one of those clock tools that can check every 24 hours. That would keep the incremental changes down below 30 seconds, so BOINC should be happier. Does that seem to make sense to anyone?

[NB to corporate users: if your Windows XP Pro computer is part of a Windows Active Directory domain, it\'s time setting is held constant by a local domain controller, and you don\'t get a chance to synchronise it yourself: there is no \'Internet time\' tab in the control panel. You just have to hope that your system administrator has remembered to set up the server time service to synch the whole network from a reliable source....]
ID: 26019 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 26021 - Posted: 14 Jan 2007, 11:17:48 UTC

As long as it jumps forward less than 30 seconds that should help. A worst problem is if it jumps back in time, even 1 second will cause a \'zero status exit\' type message.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 26021 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,714,904
RAC: 8,478
Message 26024 - Posted: 14 Jan 2007, 12:32:55 UTC

In the end, it didn\'t need an external tool (sucking up those precious cycles...): MS KB314054 has all the gory details.

I just changed one value in the registry:
HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\W32Time\\TimeProviders\\NtpClient\\
In the right pane, right-click SpecialPollInterval, and then click Modify.

- then re-start the Windows Time Service.

WARNING: Don\'t mess around with the Windows registry unless you already know what you\'re doing: a single slip can cause problems much worse than the one you\'re trying to solve.
ID: 26024 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 26041 - Posted: 15 Jan 2007, 1:44:22 UTC

Well, my box is just too unreliable in keeping time for deactivating that... though I definitely long for a nice NTP service... Somehow I have the feeling I\'m not as content with Windows as I used to be ;-)
Well, anyway, I ask you to keep your fingers crossed as I didn\'t crash a model but instead my whole Windows installation... no idea how and why, guess it just went on strike after 5 months of too much experimentation ;-) Only thing that helped was doing a complete reinstall, and that\'s the problem. I have a brand-new backup of my BOINC folder but I never actually had to use one of those so I\'m a bit uneasy about it.
If someone has advice about it it would be very much appreciated, I guess I\'m too tired for sth like this now anyway (it\'s almost 3 am here) so I\'ll wait until tomorrow and do it then...
ID: 26041 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 26044 - Posted: 15 Jan 2007, 5:27:06 UTC

Half down Les\' post:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4890&nowrap=true#25344

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 26044 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,714,904
RAC: 8,478
Message 26047 - Posted: 15 Jan 2007, 9:39:26 UTC - in response to Message 26041.  

Well, my box is just too unreliable in keeping time for deactivating that...

The registry hack is for changing the time interval between updates - making it more active!

SpecialPollInterval is in seconds, so you can set it to correct the clock as often as you like - though I think the NTP admins would get very annoyed with you if you connected too often. I\'m trying once per day.

Good luck with the reinstall.
ID: 26047 · Report as offensive     Reply Quote
old_user202664

Send message
Joined: 13 Oct 06
Posts: 60
Credit: 7,893
RAC: 0
Message 26058 - Posted: 15 Jan 2007, 18:23:09 UTC

@astro: Thanks a lot for the information :-) I\'ll read the post behind the link and then do my best.
Richard: My bad, it really was a bit late I guess ;-) in that case I\'ll definitely try that hack as soon as everything else is up and running. Thanks for keeping your fingers crossed, that can never hurt.
ID: 26058 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Exited with zero status- reason to worry?

©2024 cpdn.org