Questions and Answers : Unix/Linux : Lots of tasks end with "Error While Computing". Is there a problem at my end?
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
Most of my CDPN tasks ended with "Error While Computing." I'm wondering if this is the fault of something on my computer, or if I am doing or did something wrong. I don't run my computer 24/7. I start it up and shut it down about twice a day. I try to protect my BOINC work units by suspending projects before I shut down, and resuming them after I boot my computer. I understand some work units will end with "Error While Computing", but I noticed the most probable time for a work unit to end in this state is just after I resume CDPN. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Well, one of the things that you could try, is to set Suspend work if CPU usage is above to zero. In other words, don't constantly stop and start BOINC. (And the climate program.) This non-zero setting may be fine for other projects, but the climate models don't like it, and sooner or later ... |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
I don't have an exact option "Suspend work if CPU usage is above". I did recently change the option "Use at most ___ % of CPU time" to 85%. My CPU tends to run hot when I'm running BOINC. As Spring was ending, it was getting warm in this room. It is poorly insulated and has lots of glass, so there was some danger of my computer overheating. So, I set it to 90% and then to 85%. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The option that I mentioned is in the Computing preferences of your Account page on the project's server. These climate models DON'T like being interrupted. Sooner or later they'll crash. LOTS of your models are crashing. If it's too hot to run your computer any other way, perhaps you shouldn't run climate models. |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
I do have that option, I didn't notice it "yesterday" when I looked. I have always had it set to zero. I'm going to put CDPN on "no new tasks" until I can look at my computer's temperature issues. If I get it fixed, it might not be for several months. If I don't get it fixed, I'll detach from the project. I'll let the remaining work unit run when I have the computer on at night. |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
I'll detach from the project when the current work unit finishes. Even if I resolve the cooling issue, I won't be running my computer 24/7. |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
You could set your CPU usage down to 50%, or even lower to see if it runs cooler for now. Other than that; time to save up for liquid cooling! 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
There is another possibility -- dust. I find it necessary to vacuum the innards of my machines occasionally and blow dust (pressurized air in cans) from vanes of CPU heat sink. These always-running machines are not only effective heaters, they are also effective air filters... The additional heat is welcome in Winter but not in Summer! Hope that helps. We don't want to lose you. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 27 Jul 13 Posts: 14 Credit: 100,367 RAC: 0 |
That my CPU runs hot is why I lowered the maximum CPU use of BOINC to 90% and then 85% last month. In cooler seasons, I let BOINC use 100% of my CPU. Even allowing BOINC 100% of my CPU, I still had a lot of work units end with "Error While Computing." So, my CPU running hot is a side issue. From what was posted here before I mentioned the temperature issue, Lacking any suggestions about making a software or hardware fix, other than those related to CPU temperature, I have to conclude that the problem seems to be the result of my not running my computer 24/7. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
I run linux and most nights I turn my computer off. I use the suspend to disk option and using this, most tasks run to completion. The computer running hot could be a factor in the number of errors you get. Another option is to go into the bios and underclock it a bit. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
This is a different problem but fits the thread subject exactly. Most, but not all, of recent batch of work units give me the following error after about 1.5 minutes on one of my computers (1267447). Any ideas as to what the problem might be or how to find the problem? <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> SIGSEGV: segmentation violation Stack trace (13 frames): /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x836e1cf] [0xb0f9d400] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8136129] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813c074] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8131c87] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813d6aa] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8133fca] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8078e6f] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82d73ae] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f8867] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f14bb] /home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f97f6] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb0dac4d3] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7054, selfPID=7050, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
CPDN on Linux or Windows with Intel chips doesn't like either hot operating conditions or overclocked chips. AMD's are fine overclocked as long as the heat issue is addressed. Solve either or both as applicable and your not likely to have as many failures. Either overgrown air coolers which may or may not allow for the re-installation of the side cover (and can be quite loud), or, liquid cooling which runs fairly quiet (and most likely will reduce your core temps by 20 C, but might require a new tower). I have both an Amd 8350 and an Intel 4770k. Both are outfitted with a Seidon M120. Both run CPDN 24/7. The AMD being more power hungry runs at approx. 58 C, the Intel not being as power hungry runs at approx. 48 C. The Amd runs at 4.4 Ghz (overclocked no problem), The Intel runs at 3.5 Ghz (produces lots of failures if overclocked, but only on CPDN). Obviously though different configurations using either chip will vary depending on other variables such as motherboard and ram for instance, thusly producing different results from mine. However the end does support the mean, regardless. 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Thanks Ron Crouch for the reply. While I won't completely dismiss your suggestion, let me add the following: The CPU is a Phenom II X4 945 Quad. It is NOT over clocked. The CPU temp is running at 55C which I think is on the reasonably low side. All of the CPDN tasks that have had an error all end after about 90 sec with the same trace back list. I would think if heat were problem the failures would be at random points. I am thinking I might have some out-of-date libraries or maybe missing libraries. But, I don't have the knowledge to figure that out. If this is the case, it must be some missing library function that isn't called or used in most of the tasks. I have checked (tried to update) libc.so.6 (last entry in the traceback) and the Update Manager indicates that is up to date. So, for now I will run some Einstein tasks. |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
My suggestion wasn't intended to necessarily address your particular problem. If you are running Linux x86_64 for instance then you would also need to install the 32 bit libraries for libc.so.6. Seeing that your client is seg faulting on startup indicates that some libraries may be missing or are incompatible. 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Ron - The missing or incompatible libraries is my thinking too. But, I don't know how to find which one or ones. I am using 32 bit UBUNTU 12.04. |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
Yes it can sometimes be a royal pain trying to sort some things out. I don't use Ubuntu so I might suggest trying their forums. And 55 C is fine as long as that's the max under full system load. Would be very bad if that were the idling temp (should be around 23 C). You may need to do a major update to your Ubuntu version to bring it up to 14.04 LTS. 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
WB8ILI: use the `ldd' command as described in this thread. That tells you which shared library files CPDN applications are using. Then use `dpkg-query --search filename' to find the owning package, and check for updates. (No doubt there's a quicker way to do this, but I don't know it off the top of my head.) By the way, I'm also using 32-bit 12.04 LTS. It's fine; there's no need to upgrade to 14.04 yet, if you don't want to. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's also the opposite problem, where an older version of a program is needed by some of the models. I've come across this with the graphics of one model type, but I forget the details. I think that may be in another thread from early in the year. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Greg - I did the ldd and dpkg-query commands. All the libraries shown were part of libc6. I reinstalled that. No option for an older version. It will be a day or two before I can download another CPDN model - (already have enough work). 55C is my running 4 tasks temp. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
55C isn't likely to cause problems. |
©2024 cpdn.org