climateprediction.net (CPDN) home page
Thread 'Crashed? Should I abort?'

Thread 'Crashed? Should I abort?'

Message boards : Number crunching : Crashed? Should I abort?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user194621

Send message
Joined: 4 Aug 06
Posts: 4
Credit: 25,921
RAC: 0
Message 32576 - Posted: 11 Feb 2008, 15:51:58 UTC

My laptop was forced to shutdown a couple of weekends ago. This crashed my CPDN and QMC WUs running at the time.

When I restarted BOINC, CPDN restarted from 0% (I don\'t backup), but trickles continued to appear here:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=6980399

I remember Phase 3 trickles after 26 January, but they\'re not on that page anymore.

The WU is currently at 28.992%, and my PC has been diligently sending trickles up to now.

Is this WU still valid? Should I abort it?
ID: 32576 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 32578 - Posted: 11 Feb 2008, 18:14:49 UTC
Last modified: 11 Feb 2008, 18:21:20 UTC

Hi Gilbert, welcome to the forum.

Here\'s the model:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6980399

(It would help if you could unhide/show your computer while we\'re discussing your model.)

The model has indeed gone back to the beginning. It won\'t show any new trickles until it gets past the model date when it crashed. The server will accept new trickles from it and it will be used by the researchers. As you\'ve already repeated nearly 30% of the work, the best idea is to continue and complete the model. The phase 3 graphs will show when you\'ve completed phase 3. When your computer sends new trickles you\'ll get credits again.

You could find that backing up the BOINC folder contents is easier and quicker than you think. In the README about backups linked in my signature, the first method explained by Les is the easiest.

In the README about crashes and problems, it would be a good idea to look at item #5 by Mike.

I hope you are keeping your laptop cool while you\'re running the model. If necessary you can reduce the CPU usage to less than 100% and it\'s a good idea to raise the laptop above the table surface a little bit (not only using the little feet at the back). This allows air to circulate underneath.

Best of luck with the model. Thank you for persevering with it. Let us know how it progresses.

Cpdn news
ID: 32578 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 32580 - Posted: 11 Feb 2008, 18:57:12 UTC
Last modified: 11 Feb 2008, 19:02:56 UTC

Gilbert,

If it\'s any help, the exact same thing just happened to me, when I activated the network connection to upload the phase data and then closed the machine down, but I suspect that the post-processing was not complete and it got confused. The tricky point with slabs are the phase changes. If something odd happens at that point, then they can rewind.

My model rewound at the end of the first phase and has just started trickling again in phase 2 (here - it took a while to catch up again, and the cumulative sec/timestep has jumped - you can see it went wrong at the phase change as well).

In future I\'ll try not to shut a machine down until one trickle after the phase change. Rather conservative perhaps, but better than wasting a week.

Iain
ID: 32580 · Report as offensive     Reply Quote
old_user194621

Send message
Joined: 4 Aug 06
Posts: 4
Credit: 25,921
RAC: 0
Message 32586 - Posted: 12 Feb 2008, 13:30:03 UTC - in response to Message 32578.  

Hi Gilbert, welcome to the forum.


Thanks, mo.v!
:-)

Here\'s the model:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6980399

(It would help if you could unhide/show your computer while we\'re discussing your model.)


I\'ve just changed my preferences to show my machines.

The model has indeed gone back to the beginning. It won\'t show any new trickles until it gets past the model date when it crashed. The server will accept new trickles from it and it will be used by the researchers. As you\'ve already repeated nearly 30% of the work, the best idea is to continue and complete the model. The phase 3 graphs will show when you\'ve completed phase 3. When your computer sends new trickles you\'ll get credits again.


Ok, I\'ll let it finish. I\'m just glad the data it\'s crunching is still useful.
:-)

You could find that backing up the BOINC folder contents is easier and quicker than you think. In the README about backups linked in my signature, the first method explained by Les is the easiest.

In the README about crashes and problems, it would be a good idea to look at item #5 by Mike.


I browsed through the fora last night, both to see if I could figure out if something was happening project-wide (there was, a few weeks ago with the delayed trickles), and if I could see some useful hints, including the various stickies.

The backup procedures are pretty straightforward. I just never bothered because of the babysitting that entailed. At any rate, I don\'t lose too many WUs because of BOINC problems, et al.

I hope you are keeping your laptop cool while you\'re running the model. If necessary you can reduce the CPU usage to less than 100% and it\'s a good idea to raise the laptop above the table surface a little bit (not only using the little feet at the back). This allows air to circulate underneath.


While at home, it sits on an elevated aluminum cooling pad with two fans blowing on the laptop\'s bottom. In the office, the airconditioning is enough to keep it cool enough, even with 100% load on its dual-core heart.

Best of luck with the model. Thank you for persevering with it. Let us know how it progresses.


It\'s currently at 31.475%, waiting for its turn to run again. A huge SETI Beta WU caused my notebook to stop accepting work from all other projects for the past two days or so.

It\'s worked off it\'s anxiety by now, though. I\'ve got new WUs from two other projects, at the moment.

Again, thanks for the quick, straightforward, and friendly help!
ID: 32586 · Report as offensive     Reply Quote
old_user194621

Send message
Joined: 4 Aug 06
Posts: 4
Credit: 25,921
RAC: 0
Message 32587 - Posted: 12 Feb 2008, 13:48:40 UTC - in response to Message 32580.  

Hi Iain!

It\'s never fun to lose some work done, especially with the skyrocketing price of energy worldwide. Nobody should have to deal with such waste.

But, heck, we\'re a bunch of people who care about the science (most of us, hopefully), never mind the paradox of crunching for a climate change solution while adding to our collective carbon footprint.

Everyone seeks a balance between risk and performance in his tech endeavors. Yup, losing a week\'s work is definitely a bummer, but we all have little tricks here and there so that Murphy doesn\'t crash our rigs too often.

It\'s good to see all these helpful threads, run by friendly folks.

Thanks for being one of the good guys, Iain! Happy crunching!


Gilbert
:-)
ID: 32587 · Report as offensive     Reply Quote
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 32589 - Posted: 12 Feb 2008, 17:35:26 UTC - in response to Message 32587.  
Last modified: 12 Feb 2008, 17:41:29 UTC

GilbertP wrote:

It\'s good to see all these helpful threads, run by friendly folks.

Thanks for being one of the good guys, Iain! Happy crunching!


Hi Gilbert, welcome to the forum. It\'s also good to see people appreciating the time and effort the moderators put into this, and I absolutely concur that Iain\'s a very good guy (Mo too, if she doesn\'t mind being referred to as a \"guy\"!)
;-)
Visit the Scotland team
ID: 32589 · Report as offensive     Reply Quote
old_user194621

Send message
Joined: 4 Aug 06
Posts: 4
Credit: 25,921
RAC: 0
Message 32592 - Posted: 13 Feb 2008, 0:49:29 UTC - in response to Message 32589.  

GilbertP wrote:

It\'s good to see all these helpful threads, run by friendly folks.

Thanks for being one of the good guys, Iain! Happy crunching!


Hi Gilbert, welcome to the forum. It\'s also good to see people appreciating the time and effort the moderators put into this, and I absolutely concur that Iain\'s a very good guy (Mo too, if she doesn\'t mind being referred to as a \"guy\"!)
;-)


Oops... sorry for that... I had hoped the term was gender neutral.

;-)
ID: 32592 · Report as offensive     Reply Quote

Message boards : Number crunching : Crashed? Should I abort?

©2024 cpdn.org