Questions and Answers : Unix/Linux : Computer hang up
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Aug 04 Posts: 2 Credit: 561,312 RAC: 2,330 |
Hello, when CP run on my computer, my computer hang up. I can't detect a reason for this problem. My computer: AMD K7 1.4 GHz, K7S5A Mainboard, 512 MB Ram, 80 GB Harddisk, Kernel 2.6.5, KDE 3.2.3 This is the beginning of the start of CP by my computer: 2004-08-10 08:25:14 [---] Starting BOINC client version 4.02 for i686-pc-linux-gnu 2004-08-10 08:25:14 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults 2004-08-10 08:25:14 [climateprediction.net] Host ID is 767 2004-08-10 08:25:14 [---] General prefs: from climateprediction.net (last modified 2004-08-09 12:55:30) 2004-08-10 08:25:14 [---] General prefs: using separate prefs for home 2004-08-10 08:25:14 [---] get_local_network_info(): gethostbyname failed 2004-08-10 08:25:14 [---] get_local_network_info(): gethostbyname failed 2004-08-10 08:25:14 [climateprediction.net] Resuming computation for result 019j_100026624_0 using hadsm3 version 4.02 Starting model in /home/andreas/CP/projects/climateprediction.net... Created shared memory region key = 24870 Env Used=LD_LIBRARY_PATH=/home/andreas/CP/projects/climateprediction.net:/usr/local/lib:/usr/lib:/lib Starting model ID 019j_100026624 Phase 1 Stack size=48.00 MB Waiting for model startup, this may take a minute... 019j_100026624 - PH 1 TS 005185 - 00/00/0000 00:00 - H:M:S=0008:06:43 AVG= 5.63 DLT= 0.00 The stderr_um.txt is empty and there was no error message on the terminal. I hope you can help me. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
how long do you wait after the "model startup", minutes, hours? if you do a top in another window do you see hadsm3um_4.02_i686* running? if not it could be a problem in the crash detection. |
Send message Joined: 9 Aug 04 Posts: 2 Credit: 561,312 RAC: 2,330 |
CP runned for some hours without problems. Then the computer hanged up so I needed to restart my computer (even CTRL+ALT+DEL didn't work). This happened about four times. |
Send message Joined: 29 Aug 04 Posts: 2 Credit: 51,607 RAC: 0 |
I have experienced a very similar problem. Running result name 03e5_300029382_0 (I have applied the minor edit to the client_state.xml file) on boinc_4.05_i686-pc-linux-gnu. I have had two different machines lock up after running the experiment for varying periods of time (using the two machines I have just about limped into phase2). They do not respond to ctrl+alt+del, nor to the alt+SysRq magic keys and have to be powered down manually. I have tried running the machines with Mandrake 10 Official (kernel-2.6.3.16mdk) and SUSE 9.1 (kernel-2.6.5-7.108-default), while using the same hostname each time. I have tarred up my boinc directory to move it between machines and to back up between reinstallations. The two machines have the following spec: 1. AMD Duron 800MHz, 512MB PC133 SDRAM, VIA KT133 chipset. 2. AMD Athlon 2500+, 512MB PC2700 DDR RAM, NVIDIA nForce2 chipset. I run the experiment with the following command: mv boinc.log boinc.old.log && ./boinc_4.05_i686-pc-linux-gnu >>boinc.log 2>>error.log There are no errors in the error log and nothing beyond the standard entries in the other log. There are no clues in my syslog or kernel logs to indicate why the machines have frozen. I have eliminated overheating as a possible cause by using additional case fans, indeed on at least one occasion the computer has locked up after a clean boot, before it has had the chance to warm up. Given these problems an the fact that I don't wish to ruin my computers I don't feel I can carry on with this experiment. Meanwhile my AMD64 machine is running a different result (2m1q_100143119_0) without problems. |
Send message Joined: 2 Sep 04 Posts: 3 Credit: 70,761 RAC: 0 |
I have the same problem and I think that the recurring theme is AMD as this works correctly on other Intel machines I have with the same Linux kernel revision. The only thing I have found in addition to the above posts is that this doesn't happen running Seti - on the exact same machine with the same boinc version etc. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC on Intel & Windows is some sort of hidden & random "crashAMD()" routine! ;-) |
Send message Joined: 2 Sep 04 Posts: 3 Credit: 70,761 RAC: 0 |
> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC > on Intel & Windows is some sort of hidden & random "crashAMD()" > routine! ;-) > > Yeah that'll be the "Intel inside" bit :-) |
Send message Joined: 2 Sep 04 Posts: 3 Credit: 70,761 RAC: 0 |
> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC > on Intel & Windows is some sort of hidden & random "crashAMD()" > routine! ;-) > > Carl - Out of interest as a project administrator are you able to do a search to see how many AMD linux 2.6 kernel machines are actually returning results? I am kinda interested to see if this is a problem with my build or a problem with the CPDN fortran code etc. |
Send message Joined: 29 Aug 04 Posts: 2 Credit: 51,607 RAC: 0 |
> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC > on Intel & Windows is some sort of hidden & random "crashAMD()" > routine! ;-) > > Indeed, it might even be the cause of a law suit: http://www.theregister.co.uk/2005/07/12/amd_vs_intel_code/ |
©2024 cpdn.org