Message boards : Number crunching : This good or bad?
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Sep 04 Posts: 3 Credit: 796,077 RAC: 0 |
Today I was greeted with a Windows XP closed application error message: Faulting application hadsm3um_4.12_windows_intelx86.exe, version 0.0.0.0, faulting module unknown, version 0.0.0.0, fault address 0x00000001. Is this due to the server outage or am I just the only one? :) Never had a cpdn WU to do this. Yesterday hadsm began this activity: 2005-06-21 23:32:07 [climateprediction.net] Result 3q7u_200195693_0 exited with zero status but no 'finished' file 2005-06-21 23:32:07 [climateprediction.net] If this happens repeatedly you may need to reset the project. 2005-06-21 23:32:07 [climateprediction.net] Restarting result 3q7u_200195693_0 using hadsm3 version 4.12 (repeats a few times and an hour later) 2005-06-22 00:26:19 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 2005-06-22 00:26:22 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded Everything was okay after that, the error messages disappeared. Half a day later is when the application error occurred apparently while hadsm was running: 2005-06-22 15:20:39 [climateprediction.net] Restarting result 3q7u_200195693_0 using hadsm3 version 4.12 2005-06-22 16:20:40 [climateprediction.net] Pausing result 3q7u_200195693_0 (removed from memory) The error happened during this time at 15:31:00 per the windows application event log. But it looks as if hadsm never stopped. The WU has been crunched since then: 2005-06-22 16:27:55 [climateprediction.net] Restarting result 3q7u_200195693_0 using hadsm3 version 4.12 2005-06-22 17:27:55 [climateprediction.net] Pausing result 3q7u_200195693_0 (removed from memory) I'm not worried about the credit situation with the server - this is all for the science but I'm not sure if this is a good thing or not - a faulting appliction that apparently kept running. Should I ditch this WU or just wait until the trickle server is up again anyway and see what happens? <img src="http://predictor.scripps.edu/workunit.php?wuid=172"></img> |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
Can you say which BOINC version you are running? |
Send message Joined: 3 Sep 04 Posts: 3 Credit: 796,077 RAC: 0 |
4.19 |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
I asked about the BOINC version because some of us have experienced problems with the new BOINC Manager apparently losing contact with the application, but this would not apply here. The error messages you got about exiting with no finished file suggest some sort of interruption during file handling, I believe, but might not be related to the other error message. I would check the graphics (right click on the app in the BOINC work tab)and see what the CPDN application seems to be doing. If it is running normally and the globe is as you would expect, then I would assume all is well and carry on, at least until somebody or something tells you otherwise. It would seem wise to retain a backup though in case it is a recurrent problem. I doubt that it related to the server problems. I assume you closed and restarted BOINC. |
Send message Joined: 28 Aug 04 Posts: 90 Credit: 2,736,552 RAC: 0 |
One of my boxes behaves not more normal as it did before. After downloading a Hadsm_4.12 modell it continuously chrashes hadsm_4.12 and after doing this some times the the whole result crashes. This box completed already one run successfully with a lower version of Hadsm (I believe to remember that it was 4.04). All the system equipment is the same as on my other hosts (WinXP pro SP2, Norton AV 2004, BOINC 4.19, no other AV, malware, spyware, or something else). I will watch at this attentively and if things doesn't getting normal, unfortunately I will have to detach this box from cp.net. Ciao |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
> One of my boxes ....... continuously chrashes hadsm_4.12. Can you tell us what error messages you are getting? Is it always the same ones? |
Send message Joined: 28 Aug 04 Posts: 90 Credit: 2,736,552 RAC: 0 |
> Can you tell us what error messages you are getting? Is it always the same > ones? Hi Andrew, its rather difficult for the moment because this box is located elsewhere. On tuesday I can have a look at the stderr.txt. But I don't know how long informations about crashes are keept in this logfile. But to have a look at <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=22924">its resultpage</a> one can find that they seem to have reported the exit code -5. But I have a self written program to monitor my working boxes and their modelstates. Maybe I have the time to add an selfdiognostic eventlog which is reported to my server application in such cases. But this needs some time. Ciao |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
>they seem to have reported the exit code -5. For some reason I can only see a reported error for one of your WUs, <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=910782">this one</a>. As you say, though, it is the dreaded -5, which is pretty useless diagnostically. It just means the application crashed, which you already knew. :( The number of these from users in the northern hemisphere will probably rise markedly with the increasing heat. Once you have eliminated overheating, flakey memory, etc, then you are into the possibility of software conflict or incompatibility, so the chances of your tracking it down may not be high. I agree that the change from Hadsm 4.04 to 4.12 is probably significant, so you may have to wait for a new version. |
Send message Joined: 3 Sep 04 Posts: 3 Credit: 796,077 RAC: 0 |
> I assume you closed and restarted BOINC. Yeah, restarted the entire machine. Looked at the graphics & it's fully interactive and it's been working fine since. I'm reasonably confident my hardware is still in pristine working order; I built it with the finest parts available at the time: AthlonXP 3200+, Corsair XMS Pro DDR400 1GB (matched pair) 2-2-2-5-1T, ATI Radeon 9800 AGP 128MB, SB Live! Digital Platinum, Biostar M7NCD (nForce2 chipset) - no overclocking, strictly better performing components. Thanks for the ideas, Andrew. At least for now things are ok despite the unexplainable app crash. :) <img src="http://www.boincstats.com/stats/banner.php?id=37226"></img> |
Send message Joined: 10 Oct 04 Posts: 223 Credit: 4,664 RAC: 0 |
Hi Travis and Smudodd Maybe you are both suffering from the occasional incompatibility of Athlons with cpdn boinc, something to do with how the processor handles the calculations. Most Athlons handle cpdn boinc perfectly, but when there's a problem, it's more often with an Athlon than a Pentium, and -5 seems to be the typical error code indicating this. I gave up the struggle to make boinc cpdn work on my Athlon and moved back to classic cpdn, which works beautifully. If you want to do this, you'll have to wait till the Milton Keynes server is up again after the outage, then download classic from the Open Uni course link. Classic gives you no boinc credit, but it runs at about the same speed, gives you graphics and is equally useful to the researchers. __________________________________________________ |
©2024 cpdn.org