climateprediction.net (CPDN) home page
Thread 'Rewinding'

Thread 'Rewinding'

Questions and Answers : Unix/Linux : Rewinding
Message board moderation

To post messages, you must log in.

AuthorMessage
copycat

Send message
Joined: 24 Feb 05
Posts: 28
Credit: 121,749
RAC: 0
Message 16814 - Posted: 27 Oct 2005, 21:55:39 UTC
Last modified: 27 Oct 2005, 21:56:30 UTC

I don\'t get CPDN\'s re-winding policy. My model gets its first problem after a few thousand timesteps, so it rewinds a model-day and passes over the same timestep without any problem. Okay there, I can live with that. However, several thousand timesteps later it encounters a problem the second time in the same WU and it rewinds a model-month. Once again, it surpasses the timestep it crashed on. I\'m pretty sure it\'ll find a problem several thousand timesteps in the future again, and what then: try to rewind a model-year (which I may nor may not have passed by then?)? Why can\'t it just keep rewinding a model-day, seeing as how this problem is clearly not the WU\'s fault, but rather my pc\'s?
ID: 16814 · Report as offensive     Reply Quote
Arnaud

Send message
Joined: 3 Sep 04
Posts: 268
Credit: 256,045
RAC: 0
Message 16837 - Posted: 28 Oct 2005, 18:18:25 UTC
Last modified: 28 Oct 2005, 18:25:50 UTC

Deleted.
Arnaud
ID: 16837 · Report as offensive     Reply Quote
copycat

Send message
Joined: 24 Feb 05
Posts: 28
Credit: 121,749
RAC: 0
Message 16842 - Posted: 28 Oct 2005, 21:47:53 UTC - in response to Message 16837.  

Deleted.

Huh?
ID: 16842 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 16846 - Posted: 29 Oct 2005, 0:40:36 UTC

Some models are unstable and will crash, that\'s why they have this scheme of rewinding and quitting.

ID: 16846 · Report as offensive     Reply Quote
copycat

Send message
Joined: 24 Feb 05
Posts: 28
Credit: 121,749
RAC: 0
Message 16882 - Posted: 30 Oct 2005, 16:23:41 UTC - in response to Message 16846.  

Some models are unstable and will crash, that\'s why they have this scheme of rewinding and quitting.

I would think that if a model were unstable, it would a) cause problems very soon, and b) the unstable-ness would re-occur at (approximately) the same timestep.

Very current example (happened no more than one hour ago (some lines have been edited out):
Starting model ID 49v9_200299685 Phase 1
49v9_200299685 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 49v9_200299685 Phase 1
49v9_200299685 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
Model crashed...retrying...restart level 1
Preparing for restart...
Rewinding a model-month...
Error: Restart files for dataout/restart.month not found
Giving up, this result exceeded crash count for available restart files.
THIS is what I call an unstable model. Having two crashes on the same timestep means something is very wrong.

The crashes the model before was having were thousand of timesteps apart, the first a few thousand TS\'s from zero. This time I\'ve managed to postpone the restart.year to 10/21 PH 1. As you can see, no whole year yet, but only 3 model-crashes in this much time is a personal record. You see, I\'m having memory-problems, not model-problems. However, this is not the point here. As I\'ve demonstrated: when a model is bad, it\'s bad from the beginning. When it\'s not the model that\'s bad, but something else, it\'s exhibited in various circumstances, with absolutely no common denomintor. Therefor, in such cases, I feel a restart.day-policy should be applied everytime.
ID: 16882 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Rewinding

©2024 cpdn.org