Message boards : Number crunching : Is anyone else getting mutliple runtime errors?
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Jun 10 Posts: 4 Credit: 188,122 RAC: 0 |
Two climate models have now failed due to runtime errors. Is this problem occurring frequently for others? I hope the other models with 300+ hours on them will be able to finish. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
The HADCM3N model hadcm3n_ym0x_1900_40_007361403_0 failed on your quad today, but I can't see the other one. The other Windows/Intel computer running a model from the work unit has produced two trickles, whereas the model on your computer failed after one trickle. That argues that the problem does not lie with the model itself, but is most likely some local and possibly temporary problem. Was anything happening on the computer when it failed? Virus scan? Microsoft Update? |
Send message Joined: 25 Jun 10 Posts: 4 Credit: 188,122 RAC: 0 |
I might have been running too many programs at once. I can get away with playing video games and running 3 or 4 climate models on a four core CPU and it doesn't lag, but it might have caused one of the models to crash. Thank you for your help. |
Send message Joined: 16 Jul 11 Posts: 2 Credit: 351,861 RAC: 0 |
My first attempt at a climate run failed with a runtime error in windows XP. It had an Oct. 15 deadline and had completed 4%. I opened up my mailer and the program jumped to 100% complete, maintained running state versus 'ready to report' and started windows errors. I suspended it for now and immediately got another segment. |
Send message Joined: 16 Jul 11 Posts: 2 Credit: 351,861 RAC: 0 |
After the unit cycled back to attempt to run the error occurred once and the unit went to Computation Error. Should I just abort it? |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Unless your have a recent backup to use to do a restore there is nothing to do. The WU is dead. When it reports it will disappear from you machine. Starting your email client seems a trivial thing to cause a crash. Usually, the kind of programs that do this are those that are very resource intensive, such as video editors and the like that consume tons of RAM and CPU cycles. |
Send message Joined: 8 Aug 04 Posts: 69 Credit: 1,561,341 RAC: 0 |
You are not the only one, if that is any comfort to You. :) I had 4 models running, 60% done but this morning an error Message said that my Catalyst Driver had problems and needed to close. Screen was unresponsive so I had to shut Windows down and reboot. When I got the machine back up, BOINC showed no tasks running..?.? I have been running MemTest86+ for the last 4 hours, just to make sure the hardware is ok. No Errors. On my account page all tasks shows computing error.. The Catalyst Driver has been deleted. I have taken 4 new tasks in and disabled Network Traffic. This time I am going to make backups twice a day. Bestr of Luck to You. ChrisD |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Even experienced and reliable crunchers can have a catastrophic model crash occasionally! Cpdn news |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Chris, This may be teaching grandmother to suck eggs but make sure you suspend computation and stop BOINC before doing your backups. Dave |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Even experienced and reliable crunchers can have a catastrophic model crash occasionally! On current machines those crashes lost a lot of their nastyness. Somehow I considered the earlier models on P3 Tualatin and Athlon XP Thoroughbred to be way more valuable ;-) |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
This is specifically about model crashes with the hadcm3n models -- none available right now -- but if there are more soon -- keep it in mind. First do backups ! Second -- keep crunching Backups won't help if you get the evil 193 error at the first upload. (or at the last -- happened to me once, happened to a few others). But they will help if you get a disk read error or an "out of space" error or any driver or forced reboot error or a sigsegv on your PC. Or a mains power fail. If you look at the "top computers" tab on this site -- only 1 out of 5 of them has completed any hadcm3n model. The stats for us midrange crunchers are much better -- at least 70% complete (not counting misconfigured or overclocked machines) If you have a backup you can just restore and keep on crunching. PS -- maybe this discussion should be over on the number crunching board. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
... You're right ... done |
©2024 cpdn.org