Message boards : Number crunching : Intel I7 Woes....No successful completion since April 2015
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
I have a number of machines processing files...but my Intel I7 based machine running Windows 7 has not successfully completed processing a data file since April 2015. For the heck of it, I recently tried to reload an old version of the Boinc client (7.0.44)..which was the last version that I processed a file successfully ... but I'm still getting errors/failures. I'm about to give up, unless someone has a bright idea. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, Art, Walked thru a couple pages of your i7's failed tasks. Diagnostics are similar throughout, though a few tasks produced at least one Trickle before crashing. My i7 went crazy after M$ sent an upgrade. Boinc 6/2.19 wouldn't restart, wouldn't "repair," wouldn't reinstall nor would a later ver.6. On a whim, rather than try ver.7, an old copy of boinc 5.10.45 was tried -- successfully. So, I'm running an i7-4790 with an antique boinc version in Win.10 upgrade of Upgrade Version (which boinc reports as Vista)! I'm not trying to steal your Thread, Art, merely trying to grasp what is probably a chimera. I mention my flaky i7-box because I wonder about the possibility of some strange interaction between/among i7/other hardware/OS/boinc configurations. In my case, suspicion rests heavily on Win.10 upgrade. (I was considering reinstalling Win.7 when CPDN workload is depleted but now wonder about that step...) The CPU is on a Gigabyte MB, with 2*8GB RAM, no add-on graphics board, runs only five simultaneous CPDN copies and does nothing else. Does anyone else have strange results from an i7 machine, some problem which began after a history of succes? "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Thanks...Doesn't seem to be Win 7/Win 10 upgrade related because I've been on Win 7 SP 1 consistently through all of this. Something happened in the April 2015 time-frame on my machine or there is some strange data-driven problem which affected only my Intel I7 box (machine ID 1266353) in that time-frame. Could have been a MS OS update, but idea. At this point I'm probably going to remove Climatepredction.net processing from this machine, since it's just wasting cycles unless someone can help diagnose this. Haven't tried going back to boinc ver 5, do you think that might actually work?? Art Masson St. Charles, IL |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
One more bit of info. My CPU is and Intel Core I7-3770 running at 3.4Gz. Art Masson |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Haven't tried going back to boinc ver 5, do you think that might actually work?? Possibly. In my case I consider it sheer dumb luck. The box had three HadCM3n tasks, none of which showed CPU time, percent completed, or time remaining, nor was the graphics option available. (On other machines [Win.10 & Vista], if not everything normal, at least graphics were available.) One Task crashed. Trickles show in my account, though, and eight Trickles showed for that Task -- cause of death, common for this series: "INVALID THETA." One of the three downloaded to that machine currently has 32 Trickles. (Fingers crossed.) (All my machines run at stock speed, expecting Intel to 'do right' with its current accelerate-under-load technology.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
I've downloaded and installed version 5.8.16 from the Boinc site, and will see what happens. At the moment there are no tasks available so no processing is occurring. Will advise. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Some of it is just bad luck, when you look at all the work units that errored out on other machines also. But still, some should have completed successfully, since the WAH2 and Australia-New Zealand ones are fairly robust. It could be other programs running on your PC. I have found problems with some early versions of VirtualBox causing problems on other programs, though not necessarily CPDN. It could be an anti-virus problem also; the exclusions don't always help, but at least exclude the BOINC program and data folders. And your disk drive may have trouble keeping up with the high write rates of some tasks; try running only one at a time. You will find it eventually. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I'm with Jim1348. Possibly a anti-virus/anti-malware problem. Perhaps a change or upgrade to such software last year resulted in problems with boinc and cpdn? |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Anything is possible. I run the same version of Norton 360 on 3 other machines with no problems, however and it seems to be allowing updates to the BOINC data. If I can get a CPDN work unit downloaded on version 5.8.16, I'll limit the work to a single work unit and see what happens. My new machine number ID on version 5.8.16 is 1392340 For whatever it's worth this machine processes other BOINC projects (SETI@HOME, MILKYWAY@HOME, EINSTEIN, and a couple others) with no problems. It's only the CPDN work units which never complete successfully (since April 2015)..... Will report results on version 5.8.16 when I can get a work unit! Art Masson St. Charles, IL |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Getting this error using version 5.8.16 (can't download work unit) Any advice appreciated: 3/8/2016 6:24:44 AM|climateprediction.net|[file_xfer] Started download of file CRED_SIC_rcp85_a50_1939_1950.gz 3/8/2016 6:24:45 AM|climateprediction.net|[file_xfer] Temporarily failed download of CRED_SIC_rcp85_a50_1939_1950.gz: http error 3/8/2016 6:24:45 AM|climateprediction.net|Backing off 3 hr 41 min 43 sec on download of file CRED_SIC_rcp85_a50_1939_1950.gz 3/8/2016 6:25:06 AM|climateprediction.net|[file_xfer] Started download of file CRED_SIC_rcp85_a50_1939_1950.gz 3/8/2016 6:25:07 AM|climateprediction.net|[file_xfer] Temporarily failed download of CRED_SIC_rcp85_a50_1939_1950.gz: http error 3/8/2016 6:25:07 AM|climateprediction.net|Backing off 2 hr 39 min 18 sec on download of file CRED_SIC_rcp85_a50_1939_1950.gz |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
I'm trying a different approach,since I couldn't download a work unit on Boinc 5.8.16. I've reloaded Boinc 7.6.23 and have suspended all work units and all projects except for a single CPDN work unit. I'll let this single work unit run and see if it will complete as the only running task. Art Masson |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
First completion in a year! Ran to completion -- with only one work unit running (and no other projects!). Now will try running BOINC with one CPDN work unit but allowing BOINC to run with other projects simultaneously. Work Unit which completed is as follows: Name wah2_sas50_fdct_201412_13_348_010324402_0 Workunit 10324402 Created 23 Feb 2016 12:47:05 UTC Sent 26 Feb 2016 9:24:47 UTC Received 18 Mar 2016 7:31:51 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 1266353 Report deadline 7 Feb 2017 14:44:47 UTC Run time 259,668.46 CPU time 259,215.70 Validate state Initial Claimed credit 0.00 Granted credit 2,299.53 application version Weather At Home 2 (wah2) v7.08 Stderr show hide Trickle Click here Latest Trickles Received Time Sent (UTC) Host ID Result ID Result Name Phase Timestep CPU Time (sec) Average (sec/TS) 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 138,539 239,327 1.7275 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 127,019 219,603 1.7289 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 115,499 199,816 1.7300 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 103,979 179,918 1.7303 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 92,459 159,741 1.7277 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 80,939 139,458 1.7230 14 Mar 2016 18:34:34 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 69,419 119,401 1.7200 10 Mar 2016 11:25:58 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 57,899 99,361 1.7161 10 Mar 2016 05:57:28 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 46,379 79,424 1.7125 09 Mar 2016 22:07:38 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 34,859 59,465 1.7059 09 Mar 2016 16:45:19 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 23,339 39,858 1.7078 09 Mar 2016 11:14:50 1266353 19303979 wah2_sas50_fdct_201412_13_348_010324402_0 1 11,819 20,324 1.7196 |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Next WU failed. Trying again with all other projects suspended and only one CPDN WU processing.... |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
That is a strange amount of memory you have - 14293.39 MB. Have you done a memtest? Also, Win7 has a built-in "Memory Diagnostics Tool" if you want to give it a test. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
Memory tests normally. Windows reports 14GB of memory. |
Send message Joined: 17 Aug 05 Posts: 22 Credit: 16,057,688 RAC: 15,434 |
Hello Art, out of curiosity how hot does your cpu run? And is there well enough airflow inside your case? Is it a stock cpu cooler? I dont use windows myself anymore, but there do exist some apps for temp measurements. If your cpu is higher than say, 65 degrees Celsius, it could interfere with stability IMO. Mine is just under 50 (its an older i7 i admit), but seems rock stable. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Are you using the internal Intel graphics adapter? I found a problem (posted on this board) a couple of years ago on my I5-3550 machine (Biostar Z77 motherboard), where all the CPDN work units errored out unless I disabled the internal graphics adapter in the BIOS, and use only a PCIe card (Nvidia GTX 970 now, but it probably does not matter which). That was before WAH2, but the error rate was 100% on all the work units at the time, so I presume it still applies. Whether that is true on all hardware is another matter, but if you have an external card, I would give it a try. |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
I have an external graphics adapter -- NVIDIA GT620 |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
My CPU's (all 8) run between 42 and 50 degrees Centigrade. As Background, I've been running BOINC for years on this machine. I run 5 different projects managed by BOINC across the 8 processors on my Intel I7. The projects I run are CPDN, Einstein@Home, Enigma@Home, SETI@Home, and Milkyway@Home. If I enable all the projects and let them run, everything runs fine -- except for the CPDN projects, which inevitably all fail (since approx April 2015). All other projects run fine. What I've done (so far) is demonstrate that if I suspend all other projects except for a single work unit in CPDN, the single CPDN work unit will finish. I'm trying that one more time on a single CPDN work unit to verify. After that I will try multiple CPDN work units (with all other projects suspended). I suspect that (for some reason) starting in April 2015, something happened which started causing failures in CPDN processing. This feels like some kind of interaction problem within BOINC -- but whatever it is, it only affects CPDN work units...more later after more testing. (This could also be any number of other things including some strange Windows 7 interaction with an update in March/April 2015). I'll continue to try to see if I can determine which combination of project BOINC processing causes the CPDN work units to fail... I'm currently running BOINC 7.6.29....but I've had the same problem back to 7.0.44 as best I can tell. I tried to go back to 7.0.44 to see if the problem goes away, but it surprisingly did not... Art Masson St. Charles, IL USA |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
That is a good, methodical approach. I have had Folding on the GPU interfere with some of the older CPDN tasks, but not with WAH2. And I don't know whether the BOINC GPU projects you are running could do it too. You may be the first to find out. (Einstein, POEM and GPUGrid are no problems on the GPU for me though). |
©2024 cpdn.org