Questions and Answers : Unix/Linux : Rewinding
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Feb 05 Posts: 28 Credit: 121,749 RAC: 0 |
I don\'t get CPDN\'s re-winding policy. My model gets its first problem after a few thousand timesteps, so it rewinds a model-day and passes over the same timestep without any problem. Okay there, I can live with that. However, several thousand timesteps later it encounters a problem the second time in the same WU and it rewinds a model-month. Once again, it surpasses the timestep it crashed on. I\'m pretty sure it\'ll find a problem several thousand timesteps in the future again, and what then: try to rewind a model-year (which I may nor may not have passed by then?)? Why can\'t it just keep rewinding a model-day, seeing as how this problem is clearly not the WU\'s fault, but rather my pc\'s? |
Send message Joined: 3 Sep 04 Posts: 268 Credit: 256,045 RAC: 0 |
Deleted. Arnaud |
Send message Joined: 24 Feb 05 Posts: 28 Credit: 121,749 RAC: 0 |
Deleted. Huh? |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Some models are unstable and will crash, that\'s why they have this scheme of rewinding and quitting. |
Send message Joined: 24 Feb 05 Posts: 28 Credit: 121,749 RAC: 0 |
Some models are unstable and will crash, that\'s why they have this scheme of rewinding and quitting. I would think that if a model were unstable, it would a) cause problems very soon, and b) the unstable-ness would re-occur at (approximately) the same timestep. Very current example (happened no more than one hour ago (some lines have been edited out): Starting model ID 49v9_200299685 Phase 1 49v9_200299685 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00 Model crashed...retrying...restart level 0 Preparing for restart... Rewinding a model-day... Starting model ID 49v9_200299685 Phase 1 49v9_200299685 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00 Model crashed...retrying...restart level 1 Preparing for restart... Rewinding a model-month... Error: Restart files for dataout/restart.month not found Giving up, this result exceeded crash count for available restart files. THIS is what I call an unstable model. Having two crashes on the same timestep means something is very wrong. The crashes the model before was having were thousand of timesteps apart, the first a few thousand TS\'s from zero. This time I\'ve managed to postpone the restart.year to 10/21 PH 1. As you can see, no whole year yet, but only 3 model-crashes in this much time is a personal record. You see, I\'m having memory-problems, not model-problems. However, this is not the point here. As I\'ve demonstrated: when a model is bad, it\'s bad from the beginning. When it\'s not the model that\'s bad, but something else, it\'s exhibited in various circumstances, with absolutely no common denomintor. Therefor, in such cases, I feel a restart.day-policy should be applied everytime. |
©2024 cpdn.org