climateprediction.net (CPDN) home page
Thread 'Unrecoverable error: exit code 1073807364'

Thread 'Unrecoverable error: exit code 1073807364'

Questions and Answers : Windows : Unrecoverable error: exit code 1073807364
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user109091

Send message
Joined: 17 Nov 05
Posts: 4
Credit: 476,967
RAC: 0
Message 18003 - Posted: 10 Dec 2005, 18:33:06 UTC

Last night I was running a CPU intensive application (a terrain renderer), with BOINC running in the background (I was on phase 2 of the sulphur cycle) (I had it set up to run only when I\'m idle for more than 3 minutes). I left to go work on something else, while the rendering was taking place, and when I returned, my screen was stuck on the BOINC screensaver, and my computer was almost completely unresponsive. I managed to get into task manager and kill the rendering job, but the computer was still unresponsive. I noticed the sulphur_*etc*etc*. application running, so I killed it as well. This solved the unresponsiveness problem. However, this morning, when I looked at my model run, I noted that it was running back at near the beginning of phase 1. Upon checking the messages: I found this unrecoverable error, that appeared to have occurred at the same time as I killed the process. However, instead of restarting it later, it went and downloaded a new project. Why did it do this, and can I expect this to happen every time I\'m running another job on my machine that takes up most of the CPU? Is there anyway to get the old model back and continue crunching on it?

P.S. Similar things have happened on my work computer (with BOINC locking up my machine if I leave it idle while it is running a big job, forcing me to kill the processes, but I haven\'t checked for error messages there).

Thanks,

Dan
ID: 18003 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 18006 - Posted: 10 Dec 2005, 19:05:52 UTC

Taskmanager kills of boinc/ cpdn processes will abort the model!
You can not influence that, other than not to kill the processes.

Only, running without network link and reverting to latest backup can bring you back running the same model.
If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'.

My advice is to pauze/stop the model before doing any cpu intensive apps.
ID: 18006 · Report as offensive     Reply Quote
old_user109091

Send message
Joined: 17 Nov 05
Posts: 4
Credit: 476,967
RAC: 0
Message 18010 - Posted: 10 Dec 2005, 20:40:26 UTC - in response to Message 18006.  

Taskmanager kills of boinc/ cpdn processes will abort the model!
You can not influence that, other than not to kill the processes.

Only, running without network link and reverting to latest backup can bring you back running the same model.
If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'.

My advice is to pauze/stop the model before doing any cpu intensive apps.


Alright, I\'ll keep that in mind from now on. Just FYI, my work computer\'s model run is just fine.
ID: 18010 · Report as offensive     Reply Quote
old_user118955

Send message
Joined: 27 Nov 05
Posts: 6
Credit: 18,962
RAC: 0
Message 18124 - Posted: 13 Dec 2005, 3:45:09 UTC - in response to Message 18006.  

Taskmanager kills of boinc/ cpdn processes will abort the model!
You can not influence that, other than not to kill the processes.

Only, running without network link and reverting to latest backup can bring you back running the same model.
If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'.

My advice is to pauze/stop the model before doing any cpu intensive apps.


Why would uuploading to the server lose all trace of progressed work on the model - should the client software be set to create a backup of the previous result in a temp directory on the client PC so the user does not lose the model? Then the user can choose whether to reinstate the backup copy or start again and not be forced to start again. This is basic programming and would save a lot of headaches.

ID: 18124 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 18126 - Posted: 13 Dec 2005, 4:18:56 UTC

> Why would uuploading to the server lose all trace of progressed work on the model ...

This is the wrong way around.
The model crashed. This information was uploaded to the server, and the old data removed from the users computer. Then a new data set was downloaded.
There is no point in making a backup of a crashed model. The time for backups is BEFORE the crash.

As for basic programming, the software being run has been ported from the 64 bit Fortran software that runs on The Met\'s supercomputers.
It is over a million lines of code, and the source is over 50 Megabytes in size, and took a couple of years to get to run on desktops.
The original has been written by many scientists over many years.

And making regular backups IS a good idea. We have been telling people this for ages.
To do this, you first have to suspend BOINC and THEN copy the entire BOINC folder.
This is prevent the many files involved from getting out of sync during the long copy process, and making the backup useless.

ID: 18126 · Report as offensive     Reply Quote

Questions and Answers : Windows : Unrecoverable error: exit code 1073807364

©2025 cpdn.org