climateprediction.net (CPDN) home page
Thread 'Computer hang up'

Thread 'Computer hang up'

Questions and Answers : Unix/Linux : Computer hang up
Message board moderation

To post messages, you must log in.

AuthorMessage
willmannand

Send message
Joined: 9 Aug 04
Posts: 2
Credit: 561,312
RAC: 2,330
Message 609 - Posted: 10 Aug 2004, 6:27:05 UTC
Last modified: 10 Aug 2004, 8:51:21 UTC

Hello, when CP run on my computer, my computer hang up. I can't detect a reason for this problem. My computer: AMD K7 1.4 GHz, K7S5A Mainboard, 512 MB Ram, 80 GB Harddisk, Kernel 2.6.5, KDE 3.2.3

This is the beginning of the start of CP by my computer:
2004-08-10 08:25:14 [---] Starting BOINC client version 4.02 for i686-pc-linux-gnu
2004-08-10 08:25:14 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults
2004-08-10 08:25:14 [climateprediction.net] Host ID is 767
2004-08-10 08:25:14 [---] General prefs: from climateprediction.net (last modified 2004-08-09 12:55:30)
2004-08-10 08:25:14 [---] General prefs: using separate prefs for home
2004-08-10 08:25:14 [---] get_local_network_info(): gethostbyname failed
2004-08-10 08:25:14 [---] get_local_network_info(): gethostbyname failed
2004-08-10 08:25:14 [climateprediction.net] Resuming computation for result 019j_100026624_0 using hadsm3 version 4.02
Starting model in /home/andreas/CP/projects/climateprediction.net...
Created shared memory region key = 24870
Env Used=LD_LIBRARY_PATH=/home/andreas/CP/projects/climateprediction.net:/usr/local/lib:/usr/lib:/lib
Starting model ID 019j_100026624 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
019j_100026624 - PH 1 TS 005185 - 00/00/0000 00:00 - H:M:S=0008:06:43 AVG= 5.63 DLT= 0.00

The stderr_um.txt is empty and there was no error message on the terminal. I hope you can help me.
ID: 609 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 634 - Posted: 10 Aug 2004, 12:43:50 UTC

how long do you wait after the "model startup", minutes, hours? if you do a top in another window do you see hadsm3um_4.02_i686* running? if not it could be a problem in the crash detection.

ID: 634 · Report as offensive     Reply Quote
willmannand

Send message
Joined: 9 Aug 04
Posts: 2
Credit: 561,312
RAC: 2,330
Message 635 - Posted: 10 Aug 2004, 13:00:05 UTC

CP runned for some hours without problems. Then the computer hanged up so I needed to restart my computer (even CTRL+ALT+DEL didn't work). This happened about four times.
ID: 635 · Report as offensive     Reply Quote
old_user2917

Send message
Joined: 29 Aug 04
Posts: 2
Credit: 51,607
RAC: 0
Message 5049 - Posted: 4 Oct 2004, 16:39:02 UTC - in response to Message 635.  

I have experienced a very similar problem. Running result name 03e5_300029382_0 (I have applied the minor edit to the client_state.xml file) on boinc_4.05_i686-pc-linux-gnu. I have had two different machines lock up after running the experiment for varying periods of time (using the two machines I have just about limped into phase2). They do not respond to ctrl+alt+del, nor to the alt+SysRq magic keys and have to be powered down manually. I have tried running the machines with Mandrake 10 Official (kernel-2.6.3.16mdk) and SUSE 9.1 (kernel-2.6.5-7.108-default), while using the same hostname each time. I have tarred up my boinc directory to move it between machines and to back up between reinstallations.
The two machines have the following spec:
1. AMD Duron 800MHz, 512MB PC133 SDRAM, VIA KT133 chipset.
2. AMD Athlon 2500+, 512MB PC2700 DDR RAM, NVIDIA nForce2 chipset.

I run the experiment with the following command:

mv boinc.log boinc.old.log && ./boinc_4.05_i686-pc-linux-gnu >>boinc.log 2>>error.log

There are no errors in the error log and nothing beyond the standard entries in the other log. There are no clues in my syslog or kernel logs to indicate why the machines have frozen. I have eliminated overheating as a possible cause by using additional case fans, indeed on at least one occasion the computer has locked up after a clean boot, before it has had the chance to warm up.
Given these problems an the fact that I don't wish to ruin my computers I don't feel I can carry on with this experiment. Meanwhile my AMD64 machine is running a different result (2m1q_100143119_0) without problems.
ID: 5049 · Report as offensive     Reply Quote
old_user9850

Send message
Joined: 2 Sep 04
Posts: 3
Credit: 70,761
RAC: 0
Message 6879 - Posted: 12 Dec 2004, 9:34:23 UTC

I have the same problem and I think that the recurring theme is AMD as this works correctly on other Intel machines I have with the same Linux kernel revision.

The only thing I have found in addition to the above posts is that this doesn't happen running Seti - on the exact same machine with the same boinc version etc.
ID: 6879 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 6883 - Posted: 12 Dec 2004, 10:12:53 UTC - in response to Message 6879.  

I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC on Intel & Windows is some sort of hidden & random "crashAMD()" routine! ;-)
ID: 6883 · Report as offensive     Reply Quote
old_user9850

Send message
Joined: 2 Sep 04
Posts: 3
Credit: 70,761
RAC: 0
Message 6887 - Posted: 12 Dec 2004, 11:40:41 UTC - in response to Message 6883.  

> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC
> on Intel & Windows is some sort of hidden & random "crashAMD()"
> routine! ;-)
>
>
Yeah that'll be the "Intel inside" bit :-)
ID: 6887 · Report as offensive     Reply Quote
old_user9850

Send message
Joined: 2 Sep 04
Posts: 3
Credit: 70,761
RAC: 0
Message 6989 - Posted: 16 Dec 2004, 22:38:53 UTC - in response to Message 6883.  

> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC
> on Intel & Windows is some sort of hidden & random "crashAMD()"
> routine! ;-)
>
>
Carl - Out of interest as a project administrator are you able to do a search to see how many AMD linux 2.6 kernel machines are actually returning results?

I am kinda interested to see if this is a problem with my build or a problem with the CPDN fortran code etc.
ID: 6989 · Report as offensive     Reply Quote
old_user2917

Send message
Joined: 29 Aug 04
Posts: 2
Credit: 51,607
RAC: 0
Message 14309 - Posted: 12 Jul 2005, 20:13:16 UTC - in response to Message 6883.  

> I kind of wonder if deep in the Intel Fortran compiler we used for CPDN/BOINC
> on Intel & Windows is some sort of hidden & random "crashAMD()"
> routine! ;-)
>
>

Indeed, it might even be the cause of a law suit:

http://www.theregister.co.uk/2005/07/12/amd_vs_intel_code/
ID: 14309 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Computer hang up

©2024 cpdn.org