climateprediction.net home page
Quit running climateprediction.net???

Quit running climateprediction.net???

Questions and Answers : Unix/Linux : Quit running climateprediction.net???
Message board moderation

To post messages, you must log in.

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,193,804
RAC: 2,852
Message 13435 - Posted: 14 Jun 2005, 10:25:36 UTC

Almost all my climateprediction applications get client error messages, on both my machines. I have made repeated posts about this and gotten no useful answer. The only answer I recall was that perhaps my machine was overheating, which was not the case.

Should I quit running climateprediction? My machines seem to have no trouble running setiathome and proteinfolding applications.
ID: 13435 · Report as offensive     Reply Quote
belgix

Send message
Joined: 5 Aug 04
Posts: 85
Credit: 2,924,043
RAC: 0
Message 13439 - Posted: 14 Jun 2005, 15:02:58 UTC
Last modified: 14 Jun 2005, 15:03:54 UTC


ID: 13439 · Report as offensive     Reply Quote
belgix

Send message
Joined: 5 Aug 04
Posts: 85
Credit: 2,924,043
RAC: 0
Message 13440 - Posted: 14 Jun 2005, 15:03:12 UTC

With the little info that you give to us, you seems to have a problem with your internet connection but I also noticed you are still running BOINC 4.13.

Upgrade your BOINC software to version 4.19 or 4.43 and reset your climateprediction.net account because you might have some data corruption in your client_state.xml file. Unless you have some older WU (which it's a non-sense for me if your computer crash often), BOINC should download hadsm3_4.13_windows_intelx86.exe not hadsm3_4.12_windows_intelx86.exe.
ID: 13440 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,193,804
RAC: 2,852
Message 13447 - Posted: 14 Jun 2005, 18:43:41 UTC - in response to Message 13440.  
Last modified: 14 Jun 2005, 18:47:05 UTC

> With the little info that you give to us, you seems to have a problem with
> your internet connection but I also noticed you are still running BOINC 4.13.
>
> Upgrade your BOINC software to version 4.19 or 4.43 and reset your
> climateprediction.net account because you might have some data corruption in
> your client_state.xml file. Unless you have some older WU (which it's a
> non-sense for me if your computer crash often), BOINC should download
> hadsm3_4.13_windows_intelx86.exe not hadsm3_4.12_windows_intelx86.exe.
>
>
If I am having trouble with my Internet connection, why does it never cause problems with setiathome or proteinfolding? I do not notice problems with the Internet connections with web browsing or e-mail either.

I have been running boinc 4.43 since June 4; I ran boinc 4.19 since February 3.

My computer does not crash. I reboot it only when I get a new kernel, which averages about once a month, and when the power fails for over an hour (the amount of time my UPS can keep it up).

Since I run Linux, there is no reason to download hadsum...windows... anything. And since I posted this to the Unix/Linux list, why tell me about windows at all?

Right now it has:

-rwx------ 1 boinc boinc 2862026 Apr 5 04:49 hadsm3_4.13_i686-pc-linux-gnu
-rwx------ 1 boinc boinc 2999312 Apr 5 04:50 hadsm3data_4.13_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 boinc boinc 6122319 Aug 23 2004 hadsm3se_4.04_i686-pc-linux-gnu
-rwxrwxr-x 1 boinc boinc 3824406 Mar 2 10:33 hadsm3se_4.10_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 3674708 Mar 3 13:43 hadsm3se_4.11_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 6202657 Mar 29 13:04 hadsm3se_4.12_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 6122319 Aug 23 2004 hadsm3se_4.13_i686-pc-linux-gnu
-rw------- 1 boinc boinc 4113910 Apr 5 05:18 hadsm3se_4.13_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 boinc boinc 10484517 Aug 23 2004 hadsm3um_4.04_i686-pc-linux-gnu
-rwxrwxr-x 1 boinc boinc 10097740 Mar 2 09:03 hadsm3um_4.10_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 8315771 Mar 3 12:23 hadsm3um_4.11_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 11074087 Mar 29 12:53 hadsm3um_4.12_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 10484517 Aug 23 2004 hadsm3um_4.13_i686-pc-linux-gnu
-rw------- 1 boinc boinc 4010230 Apr 5 05:16 hadsm3um_4.13_i686-pc-linux-gnu.zip
-rwxr-xr-x 1 boinc boinc 1223002 Sep 7 2004 hadsm3viz_4.04_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 2134914 Mar 2 11:50 hadsm3viz_4.10_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 1712913 Mar 3 13:40 hadsm3viz_4.11_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 1712913 Mar 29 15:18 hadsm3viz_4.12_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 1712913 Apr 4 14:38 hadsm3viz_4.13_i686-pc-linux-gnu
-rwxr-xr-x 1 boinc boinc 2366350 Jun 22 2004 libGL.so.1
-rwxr-xr-x 1 boinc boinc 502756 Jul 5 2004 libGLU.so.1
-rwxr-xr-x 1 boinc boinc 320516 Jun 22 2004 libglut.so.3

ID: 13447 · Report as offensive     Reply Quote
Arnaud

Send message
Joined: 3 Sep 04
Posts: 268
Credit: 256,045
RAC: 0
Message 13452 - Posted: 14 Jun 2005, 20:19:53 UTC

It's not because SETI or P@H run fine that CPDN must run fine.
CPDN is far more hardware stressing than other DC projects: so don't take for granted that your machine is stable.

Belgix speaks of windows perhaps because in your account you have a error with windows apps (don't know why: are you on dual-boot?). Perhaps he made a mistake. he was just trying to help...:o)

Try a fresh install of BOINC 4.32 or 4.43 with CPDN 4.13 apps that are stable for me. Test your machine with programs like memtest, superpi and prime95 (I don't know if these programs exist for Linux, so try them if you have a dual-boot with Windows)


ID: 13452 · Report as offensive     Reply Quote
Profile old_user248

Send message
Joined: 6 Aug 04
Posts: 65
Credit: 1,605,224
RAC: 0
Message 13458 - Posted: 15 Jun 2005, 1:16:19 UTC - in response to Message 13452.  
Last modified: 15 Jun 2005, 1:19:57 UTC

> CPDN is far more hardware stressing than other DC projects: so don't take for
> granted that your machine is stable.
>
> for me. Test your machine with programs like memtest, superpi and prime95 (I
> don't know if these programs exist for Linux,

memtest86 runs from a diskette and actually formats the floppy using linux and available at: http://www.memtest86.com/

Prime95 does exist for Linux at: http://www.mersenne.org/freesoft.htm

Run the torture test (-t) for a while and if there is any problems with the stability it is usually good at finding them.

DaveN
ID: 13458 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,193,804
RAC: 2,852
Message 13466 - Posted: 15 Jun 2005, 11:07:41 UTC - in response to Message 13452.  
Last modified: 15 Jun 2005, 11:52:18 UTC

> It's not because SETI or P@H run fine that CPDN must run fine.
> CPDN is far more hardware stressing than other DC projects: so don't take for
> granted that your machine is stable.

I have no trouble running IBM's DB2 V8.1.7 DBMS, with an application that takes a couple of hours to populate a database. Of course, that stresses the machine in a different way than the climateprediction application.

The respondent said I was having problems with my Internet connection, and my response was that if it was an Internet connection problem, why would I not have trouble with s@h or P@h? This should have nothing to do with hardware stress since running the Internet connection is not stressful for anything.

Since there are two completely different systems giving the same error messages, I hardly think it is a common hardware problem.
>
> Belgix speaks of windows perhaps because in your account you have a error with
> windows apps (don't know why: are you on dual-boot?). Perhaps he made a
> mistake. he was just trying to help...:o)

I have two machines: my main one, and an older one. The main one is Linux-Only, running a fully up to date Red Hat Enterprise Linux 3 with two 3.06GHz Intel Xeon hyperthreaded processors on a SuperMicro X5DP8-G2 motherboard with 4096 MBytes ECC RAM. It is quite warm here the last few days, with the processor running like this:

CPU0 fan: 4018 RPM (min = 375 RPM, div = 8)
CPU1 fan: 3375 RPM (min = 375 RPM, div = 8)
System: +46C (limit = +50C, hysteresis = +48C) sensor = thermistor
CPU0: +51.5C (limit = +60C, hysteresis = +58C) sensor = thermistor
CPU1: +51.5C (limit = +60C, hysteresis = +58C) sensor = thermistor

N.B.: Intel requires that the processors never exceed 70C, and they do not: the Intel-supplied cooling fans (60mm x 38mm) are thermistor controlled and speed up as the System temperature increases to keep the processors at the proper temperatures. This box has a total of 13 cooling fans. The air intake has a filter that is cleaned monthly, and the inside of the box is amazingly dust free, with only very fine stuff at the bottom of the box (that is vacuumed out at the same time as I clean the filter). The CPU heat sinks are not dusty.

The older run runs a Red Hat Linux 9 (with all of Red Hat's updates for it -- but no longer supported) with two 550 MHz Pentium III processors on a Tyan Tiger 100 motherboard and 512 Megabytes RAM. The older one is dual-boot, but when running Windows XP, it does not run BOINC.

Both the machines run setiathome and proteinfolding OK, and both fail with climateprediction.
>
> Try a fresh install of BOINC 4.32 or 4.43 with CPDN 4.13 apps that are stable
> for me.

I am downloading a fresh copy of Boinc 4.43 as I type this. If it compares identically with the one I am currently running, I doubt I would replace it.

$ cmp boinc_4.43_i686-pc-linux-gnu.sh /tmp/boinc_4.43_i686-pc-linux-gnu.sh
$

Source compares.

$ cmp boinc ~boinc/BOINC/boinc
$

Executable compares.

CPDN is running 4.13 applications already.

> Test your machine with programs like memtest,

I just ran memtest86 v3.0 for 9 hours. It completed 7 passes with no errors or warnings.

> superpi and prime95 (I
> don't know if these programs exist for Linux, so try them if you have a
> dual-boot with Windows)
>
I do not have those, and my main machine will not run Windows (no license, for one thing, and I do not want it contaminated with Microsoft stuff for another).

I just downloaded mprime (prime95?) and started running it with
mprime -t as suggested by another poster.

Its stress.txt file says it tests, among other things, the L1 and L2 caches of the processors. I infer it does not directly test the L3 caches of these processors (although if they are failing, I am sure something would turn up.). So far, after about an hour, it has not turned up anything:

$ ./mprime -t
Beginning a continuous self-test to check your computer.
Please read stress.txt. Hit ^C to end this test.
Test 1, 4000 Lucas-Lehmer iterations of M19922945 using 1024K FFT length.
Test 2, 4000 Lucas-Lehmer iterations of M19922943 using 1024K FFT length.
Self-test 1024K passed!
Test 1, 800000 Lucas-Lehmer iterations of M172031 using 8K FFT length.
Test 2, 800000 Lucas-Lehmer iterations of M163839 using 8K FFT length.
Self-test 8K passed!
Test 1, 560000 Lucas-Lehmer iterations of M212991 using 10K FFT length.
Test 2, 560000 Lucas-Lehmer iterations of M210415 using 10K FFT length.
Test 3, 560000 Lucas-Lehmer iterations of M208897 using 10K FFT length.

I am not sure how much stress it can put on my machine, since it runs nice 19 and there are four BOINC processes running also at nice 19, so it is getting only about 20% of my machine's processing power. Should I stop all BOINC applications to ensure a valid test? My machine normally runs at very very close to 100% CPU (i.e., 0% idle) all the time.

ID: 13466 · Report as offensive     Reply Quote
belgix

Send message
Joined: 5 Aug 04
Posts: 85
Credit: 2,924,043
RAC: 0
Message 13467 - Posted: 15 Jun 2005, 13:33:12 UTC
Last modified: 15 Jun 2005, 13:50:30 UTC

I never said you get overheating problems with your computer but if you take a look at your last 10 trickles sent (failure only),

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=896204
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=867423

It's quite strange that all those WU had ended with "Downloading" error and the BOINC core client version is 4.13. Climateprediction.net recommends using Boinc 4.19.
ID: 13467 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,193,804
RAC: 2,852
Message 13477 - Posted: 16 Jun 2005, 2:47:09 UTC - in response to Message 13467.  

> I never said you get overheating problems with your computer

True: another person responded with that one.

> but if you take a
> look at your last 10 trickles sent (failure only),
>
> http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=896204
> http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=867423

896204 has exit status 0 (0x0) 31 May 2005 16:54:55 UTC

4.13
app_version download error: couldn't get input files:
hadsm3_4.12_windows_intelx86.exe: signature verification error


0
0

867423 has exit status 2 (0x2) 28 May 2005 18:44:50 UTC

4.13
The system cannot find the file specified. (0x2) - exit code 2 (0x2)

1
0

896204 has exit status 0 (0x0) 31 May 2005 16:54:55 UTC
4.13
app_version download error: couldn't get input files:
hadsm3_4.12_windows_intelx86.exe: signature verification error


0
0

Well, that is pretty funny when you consider this machine does not run Windows at all, and never has. The above ones seem newer than those listed on the View Computers... stuff, where the most recent failures were like these.

761356: 251 (0xfb) 22 May 2005 17:19:55 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



700305: 251 (0xfb) 8 Apr 2005 1:43:48 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



679377: 251 (0xfb) 5 Apr 2005 8:31:57 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



677569: 251 (0xfb) 3 Apr 2005 12:21:39 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 30.000061 sec - exiting
No heartbeat from core client for 30.000018 sec - exiting
No heartbeat from core client for 30.000050 sec - exiting
No heartbeat from core client for 30.000063 sec - exiting
No heartbeat from core client for 30.000007 sec - exiting
No heartbeat from core client for 30.000007 sec - exiting
No heartbeat from core client for 30.000045 sec - exiting


>
> It's quite strange that all those WU had ended with "Downloading" error and
> the BOINC core client version is 4.13. Climateprediction.net recommends using
> Boinc 4.19.
>
I do not know that they got Downloading Error exactly. There is no reason why my BOINC client should be trying to download windows stuff.

I have been running boinc 4.43 since June 4; I ran boinc 4.19 since February 3.
>
ID: 13477 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,193,804
RAC: 2,852
Message 13478 - Posted: 16 Jun 2005, 2:49:38 UTC - in response to Message 13467.  

> I never said you get overheating problems with your computer

True: another person responded with that one.

> but if you take a
> look at your last 10 trickles sent (failure only),
>
> http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=896204
> http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=867423

896204 has exit status 0 (0x0) 31 May 2005 16:54:55 UTC

4.13
app_version download error: couldn't get input files:
hadsm3_4.12_windows_intelx86.exe: signature verification error


0
0

867423 has exit status 2 (0x2) 28 May 2005 18:44:50 UTC

4.13
The system cannot find the file specified. (0x2) - exit code 2 (0x2)

1
0

896204 has exit status 0 (0x0) 31 May 2005 16:54:55 UTC
4.13
app_version download error: couldn't get input files:
hadsm3_4.12_windows_intelx86.exe: signature verification error


0
0

Well, that is pretty funny when you consider this machine does not run Windows at all, and never has. The above ones seem newer than those listed on the View Computers... stuff, where the most recent failures were like these.

761356: 251 (0xfb) 22 May 2005 17:19:55 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



700305: 251 (0xfb) 8 Apr 2005 1:43:48 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



679377: 251 (0xfb) 5 Apr 2005 8:31:57 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



677569: 251 (0xfb) 3 Apr 2005 12:21:39 UTC
4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 30.000061 sec - exiting
No heartbeat from core client for 30.000018 sec - exiting
No heartbeat from core client for 30.000050 sec - exiting
No heartbeat from core client for 30.000063 sec - exiting
No heartbeat from core client for 30.000007 sec - exiting
No heartbeat from core client for 30.000007 sec - exiting
No heartbeat from core client for 30.000045 sec - exiting


>
> It's quite strange that all those WU had ended with "Downloading" error and
> the BOINC core client version is 4.13. Climateprediction.net recommends using
> Boinc 4.19.
>
I do not know that they got Downloading Error exactly. There is no reason why my BOINC client should be trying to download windows stuff.

I have been running boinc 4.43 since June 4; I ran boinc 4.19 since February 3.
>

BTW: mprime -t has now been running for 908 minutes with no errors. I am beginning to suspect that my system is pretty stable. Memtest86 ran 9 hours or a bit over 7 passes with no errors.
ID: 13478 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Quit running climateprediction.net???

©2024 cpdn.org