Thread 'Unrecoverable error: exit code 1073807364'

Author	Message
old_user109091 Send message Joined: 17 Nov 05 Posts: 4 Credit: 476,967 RAC: 0	Message 18003 - Posted: 10 Dec 2005, 18:33:06 UTC Last night I was running a CPU intensive application (a terrain renderer), with BOINC running in the background (I was on phase 2 of the sulphur cycle) (I had it set up to run only when I\'m idle for more than 3 minutes). I left to go work on something else, while the rendering was taking place, and when I returned, my screen was stuck on the BOINC screensaver, and my computer was almost completely unresponsive. I managed to get into task manager and kill the rendering job, but the computer was still unresponsive. I noticed the sulphur_etcetc*. application running, so I killed it as well. This solved the unresponsiveness problem. However, this morning, when I looked at my model run, I noted that it was running back at near the beginning of phase 1. Upon checking the messages: I found this unrecoverable error, that appeared to have occurred at the same time as I killed the process. However, instead of restarting it later, it went and downloaded a new project. Why did it do this, and can I expect this to happen every time I\'m running another job on my machine that takes up most of the CPU? Is there anyway to get the old model back and continue crunching on it? P.S. Similar things have happened on my work computer (with BOINC locking up my machine if I leave it idle while it is running a big job, forcing me to kill the processes, but I haven\'t checked for error messages there). Thanks, Dan ID: 18003 · Reply Quote

old_user56785 Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0	Message 18006 - Posted: 10 Dec 2005, 19:05:52 UTC Taskmanager kills of boinc/ cpdn processes will abort the model! You can not influence that, other than not to kill the processes. Only, running without network link and reverting to latest backup can bring you back running the same model. If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'. My advice is to pauze/stop the model before doing any cpu intensive apps. ID: 18006 · Reply Quote

old_user109091 Send message Joined: 17 Nov 05 Posts: 4 Credit: 476,967 RAC: 0	Message 18010 - Posted: 10 Dec 2005, 20:40:26 UTC - in response to Message 18006. Taskmanager kills of boinc/ cpdn processes will abort the model! You can not influence that, other than not to kill the processes. Only, running without network link and reverting to latest backup can bring you back running the same model. If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'. My advice is to pauze/stop the model before doing any cpu intensive apps. Alright, I\'ll keep that in mind from now on. Just FYI, my work computer\'s model run is just fine. ID: 18010 · Reply Quote

old_user118955 Send message Joined: 27 Nov 05 Posts: 6 Credit: 18,962 RAC: 0	Message 18124 - Posted: 13 Dec 2005, 3:45:09 UTC - in response to Message 18006. Taskmanager kills of boinc/ cpdn processes will abort the model! You can not influence that, other than not to kill the processes. Only, running without network link and reverting to latest backup can bring you back running the same model. If the error condition is uploaded to the server, than it is not usefull to complete the model. All you can do is to \'consider it lost\'. My advice is to pauze/stop the model before doing any cpu intensive apps. Why would uuploading to the server lose all trace of progressed work on the model - should the client software be set to create a backup of the previous result in a temp directory on the client PC so the user does not lose the model? Then the user can choose whether to reinstate the backup copy or start again and not be forced to start again. This is basic programming and would save a lot of headaches. ID: 18124 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 18126 - Posted: 13 Dec 2005, 4:18:56 UTC > Why would uuploading to the server lose all trace of progressed work on the model ... This is the wrong way around. The model crashed. This information was uploaded to the server, and the old data removed from the users computer. Then a new data set was downloaded. There is no point in making a backup of a crashed model. The time for backups is BEFORE the crash. As for basic programming, the software being run has been ported from the 64 bit Fortran software that runs on The Met\'s supercomputers. It is over a million lines of code, and the source is over 50 Megabytes in size, and took a couple of years to get to run on desktops. The original has been written by many scientists over many years. And making regular backups IS a good idea. We have been telling people this for ages. To do this, you first have to suspend BOINC and THEN copy the entire BOINC folder. This is prevent the many files involved from getting out of sync during the long copy process, and making the backup useless. ID: 18126 · Reply Quote