|
Message boards : Number crunching : Folding@Home not compatible with HadCM3 shorts
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I have found that running Folding@Home on GPUs and CPDN on the CPUs causes the HadCM3 short work units to error out after about 10 to 13 minutes. At least that is the way it is on my dual GTX 750 Tis that I use for Folding, while running CPDN with BOINC 7.6.1 (Win7 64-bit). Normally, Folding gets along fine with the various BOINC projects I run, so that is somewhat of a surprise. I have not tried to isolate it further to see if it is the Folding client software or the Folding cores and work units themselves (Core 17 at the moment) that cause the problem. But this raises the possibility that other activity on the GPUs could cause problems too (games, video editing, etc.) so if you seem to be getting too many errors, you might try disabling those and see if it reduces the CPDN error rate. I haven't looked into the other CPDN work unit types, but there might be problems there too. Normally, I like to run BOINC as a service, which pretty much isolates it from other software, but that does not work with the HadCM3 shorts, and as I recall the HadAM3P-HadRM3P Pacific North West work units. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Your problem could be the version of Boinc that you are running. I have no problem running Seti (1 task at a time) on the GPU and CPDN (4 tasks on 2 hyperthreaded cores) using Boinc 7.4.42. It may be that Boinc 7.6.1 isn’t stable under that kind of load. Also have you checked core temps? Excessive heat buildup might cause instability. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I should have mentioned that running BOINC projects on the GPU does not cause the problem. At least Einstein and POEM don't, and I have run a lot of GPUGrid too without incident. It is just Folding, which uses its own client. That normally is an advantage, since I can then run BOINC as a service, which is a bit more stable in some cases unrelated to the current situation. (The core temps etc. are fine.) This machine runs Folding (FAH Client 7.4.4) on two GTX 750 Tis: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1363431 This machine does not run Folding, but runs Einstein and POEM on two GTX 750 Tis (BOINC 7.6.1): http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1349694 They are otherwise similar Haswell boards (Z87/Z97), with nothing overclocked. If you check, most of the errors are "no resubmissions", which I aborted in some cases. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
No resubmission means that task isn't needed/required. BOINC has a problem with recognizing this, and after a period of nnn days re-issues them. If you get any, Abort them. |
Send message Joined: 1 Jan 07 Posts: 1066 Credit: 36,887,369 RAC: 1,533 |
Your problem could be the version of Boinc that you are running. I have no problem running Seti (1 task at a time) on the GPU and CPDN (4 tasks on 2 hyperthreaded cores) using Boinc 7.4.42. It may be that Boinc 7.6.1 isn’t stable under that kind of load. Also have you checked core temps? Excessive heat buildup might cause instability. v7.6.1 was a botched release, and has already been replaced by v7.6.2 - but I think it was more to do with Manager compatibility with Windows 10, rather than any problem with the client running applications. |
Send message Joined: 23 Jul 13 Posts: 5 Credit: 176,000 RAC: 0 |
Could this be the reason for my errors on this machine: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1373589 it is running FAH on the 780ti and BOINC on the cpu, it is a 24/7 machine with low cpu/system temps and uptimes of around 45 days with no other errors/issues. Is there a solution for this compatibility issue yet? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
No. The errors, (from stderr on the Task ID page for each of those models), is: ATM_DYN : INVALID THETA DETECTED This means that the physics of that particular theoretical world went beyond know limits, and the program terminated the run. They're all "short" models, which were/are set to run near the limits of stability to test something that I've forgotten. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
While Les is undoubtedly correct that the "Invalid Theta Detected" usually indicates a bad model, as I have posted before, that is not always the case: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=8003&nowrap=true#51187 Each of the three machines that failed had that error message, even though it completed successfully on my machine. So it appears that something else was the cause, or it would have failed on my machine too I would think. I don't know whether Folding could trigger that particular error, but as Les says, probably not, though I have not investigated the Folding problems to that extent; I just avoid FAH on that machine now. |
Send message Joined: 23 Jul 13 Posts: 5 Credit: 176,000 RAC: 0 |
Thanks for the replies, I've removed boinc from that machine now and will keep it for fah, that boinc client wasn't getting any more work after those failures either. At least the 2 better machines are running error free. :) |
©2025 cpdn.org