climateprediction.net home page
All work units received since 1-Aug-18 get a "Computation error"

All work units received since 1-Aug-18 get a "Computation error"

Questions and Answers : Windows : All work units received since 1-Aug-18 get a "Computation error"
Message board moderation

To post messages, you must log in.

AuthorMessage
RogerM

Send message
Joined: 31 Aug 04
Posts: 2
Credit: 4,520,346
RAC: 0
Message 58699 - Posted: 5 Sep 2018, 19:48:48 UTC

The work units run for varying lengths of time so I'm burning through a lot of CPU time with out getting any credits since the project came back on line on 1-Aug-18. Here's a typical work unit; https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11584208. And here's another; https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11606997. The work units also appear to fail on other computers. Is this a known issue, and is there something I can do about it?

Thank you.
ID: 58699 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 58715 - Posted: 6 Sep 2018, 16:43:51 UTC - in response to Message 58699.  

I started looking at that PC's failures, from the ones around early Aug until now. At first I thought that maybe you had incredibly bad luck with the SAM25 models, which have a pretty high failure rate over all. But then I saw that you had PNW, CAM and CAF failures as well. All the SAM and CAM failure were signal 11 while the PNW ones weren't.

Did anything change on your PC around August 1st?

Besides basic maintenance such as blowing out the air ducts with compressed air when the PC is shutdown and ensuring that there is some space between the vents and the surface it is on, you could whitelist the BOINC program files and data folders from antivirus scanning.

On the failures that occurred after quite some time and some returned trickles, the stderr event log has lots of "suspends" in the log. cpdn tasks are more prone to failure when there are lots of suspends. In the computing preferences in BOINC manager, you should un-tick "Suspend when computer is in use" and "Suspend when non-BOINC usage is above xx percent" and tick "Leave non-GPU tasks in memory when suspended".

Finally, you might want to set the climateprediction.net project to no new tasks, remove it from boinc manager, then re-add it in order to make sure there are no corrupted files in the projects/climateprediction.net directory.
ID: 58715 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 58716 - Posted: 6 Sep 2018, 19:26:19 UTC - in response to Message 58715.  

I started looking at that PC's failures, from the ones around early Aug until now. At first I thought that maybe you had incredibly bad luck with the SAM25 models, which have a pretty high failure rate over all. But then I saw that you had PNW, CAM and CAF failures as well. All the SAM and CAM failure were signal 11 while the PNW ones weren't.

I am not sure of the difference between a "signal 11" failure and anything else. But I had a pnw25 fail recently without signal 11. It may have been due to lack of space on my ramdisk; I was not around at the time to check it.
https://www.cpdn.org/cpdnboinc/result.php?resultid=21263860

It seems that otherwise each of RogerM's failures can be explained by bad luck (very bad luck). It is very strange.
ID: 58716 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58717 - Posted: 6 Sep 2018, 19:49:18 UTC

It may also be overclocking.
ID: 58717 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58719 - Posted: 6 Sep 2018, 20:49:56 UTC

Do you Suspend BOINC, then Exit BOINC before shutting down the computer?

Do you allow Windows to apply updates while climate models are running?
ID: 58719 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,512,201
RAC: 928
Message 58726 - Posted: 7 Sep 2018, 19:50:53 UTC

I am having the same problem (Signal 11) on one of my computers. I know there were a lot of segment violations in the past, but as I remember, most of those were on LINUX machines. But, mine is a Windows 10 Laptop.

I am not overclocking, CPU temp is reasonable, not installing Windows updates, and not suspending work.


https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1317652
ID: 58726 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 58727 - Posted: 7 Sep 2018, 22:05:35 UTC - in response to Message 58726.  
Last modified: 7 Sep 2018, 22:06:52 UTC

I am having the same problem (Signal 11) on one of my computers. I know there were a lot of segment violations in the past, but as I remember, most of those were on LINUX machines. But, mine is a Windows 10 Laptop.

I am not overclocking, CPU temp is reasonable, not installing Windows updates, and not suspending work.


https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1317652

Looks like all your recent failures were SAM25 models from batch 742. All of those have had at least one failure on another PC before you downloaded them. This batch has a high failure rate relative to a lot of other batches. But, I appear to have been lucky so far with 4 completions and 3 more running with at least one trickle with no failures from that batch.

You're running 2 EU25's now so hopefully you'll have more luck with them.
ID: 58727 · Report as offensive     Reply Quote
sinusoid

Send message
Joined: 7 Dec 07
Posts: 1
Credit: 13,223,647
RAC: 967
Message 59002 - Posted: 13 Nov 2018, 17:15:55 UTC - in response to Message 58727.  

I am having the same issues, Most of the recent ones are WAH, and they keep having errors. No overclocking, and over half the time I am not even on the computer while it is working.
ID: 59002 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 59004 - Posted: 14 Nov 2018, 0:39:48 UTC - in response to Message 59002.  

I am having the same issues, Most of the recent ones are WAH, and they keep having errors. No overclocking, and over half the time I am not even on the computer while it is working.


Most of the errors on that computer are on SAM25 models with signal 11 errors. The SAM25 models are very sensitive and quite a few computers are having those problems. Hopefully you'll pick up some of the different WAH2 regions from now on.
ID: 59004 · Report as offensive     Reply Quote

Questions and Answers : Windows : All work units received since 1-Aug-18 get a "Computation error"

©2024 cpdn.org