climateprediction.net (CPDN) home page
Thread 'idea for improvement'

Thread 'idea for improvement'

Message boards : Number crunching : idea for improvement
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user392
Avatar

Send message
Joined: 7 Aug 04
Posts: 57
Credit: 4,168
RAC: 0
Message 3217 - Posted: 7 Sep 2004, 9:19:21 UTC
Last modified: 7 Sep 2004, 9:20:10 UTC

<a href="http://www.adastrawithseti.de"><img src="http://adastrawithseti.de/pic/smilies/idea.gif"></a>

I have thought about the following probelm:

You run CPDN on your machine, then it crashes and it restarts or you have to reset or you rejoin the project in a certain time.
Result: You get the same workunit, but you have to start at zero.

My idea is that if the user has already send a compleated trickel, that this trickle will be sen with the workunit to the user once more, so that she/he hasn't to start at the total beginning. This seems to me very evicient, becuase if f.e. 70% are calculated and you have to resart ... .

What do you think?

<a href="http://www.adastrawithseti.de"><img src="http://adastrawithseti.de/pic/logo.jpg"></a>

Greetings from Berlin(Germany)
Basti
ID: 3217 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 3219 - Posted: 7 Sep 2004, 9:33:30 UTC - in response to Message 3217.  

Hi, well the trickle itself doesn't have much data, it's all in the "restart dump" on your machine. On a crash, the run should attempt first the restart dump for the last day saved (i.e. within 144 timesteps ago), then the last restart.month, then finally tries from a restart.year. So if it crashes on all three tries (or less if you didn't make it to a "model-month" or "model-year"), there is nothing left for that machine to do. Most crashes seem to be errors from Win98/ME, odd Linux libraries, etc, (i.e. machine/software errors) rather than the model itself going unstable or crashing due to bad parameters etc.

ID: 3219 · Report as offensive     Reply Quote
old_user1470

Send message
Joined: 26 Aug 04
Posts: 1
Credit: 10,222
RAC: 0
Message 3223 - Posted: 7 Sep 2004, 9:41:52 UTC

Unlikely.

Take a look at your BOINC folder, there's many megabytes of data there that makes up the current state of the model.

All of that data would have to be backed-up and restored to allow a restart.

The disk space requirements for a CPDN unit specify 600Mb so it could be that much that needs to be saved/restored = not practical.

ID: 3223 · Report as offensive     Reply Quote
Profileold_user392
Avatar

Send message
Joined: 7 Aug 04
Posts: 57
Credit: 4,168
RAC: 0
Message 3235 - Posted: 7 Sep 2004, 11:05:04 UTC - in response to Message 3219.  

&gt; Hi, well the trickle itself doesn't have much data, it's all in the "restart
&gt; dump" on your machine. On a crash, the run should attempt first the restart
&gt; dump for the last day saved (i.e. within 144 timesteps ago), then the last
&gt; restart.month, then finally tries from a restart.year. So if it crashes on
&gt; all three tries (or less if you didn't make it to a "model-month" or
&gt; "model-year"), there is nothing left for that machine to do. Most crashes
&gt; seem to be errors from Win98/ME, odd Linux libraries, etc, (i.e.
&gt; machine/software errors) rather than the model itself going unstable or
&gt; crashing due to bad parameters etc.
&gt;
&gt;
Doesn't know that, but sounds very similar. :-)
Thx for your update!


<a href="http://www.adastrawithseti.de"><img src="http://adastrawithseti.de/pic/logo.jpg"></a>

Greetings from Berlin(Germany)
Basti
ID: 3235 · Report as offensive     Reply Quote

Message boards : Number crunching : idea for improvement

©2024 cpdn.org