Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
Send message Joined: 11 May 06 Posts: 4 Credit: 1,008,514 RAC: 0 |
... Unless you have a backup prior to the 10th I\'d suggest you abort it. Thanks for that - It\'s dead. I\'ll take a look at \'backups\' although I always believed all the maintainance stuff should really be handled automatically by the BOINC wrapper. OT slightly - Is there an error FAQ somewhere? It seems that another of my models (7723388) after appearing to be fine has been marked invalid when it uploaded today :( I\'d like to get to the bottom of it - esp. if I still have a hidden hardware issue. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
OT slightly - Is there an error FAQ somewhere? It seems that another of my models (7723388) after appearing to be fine has been marked invalid when it uploaded today :( All the trickles are there, as well as the total model graph. But the stderr out listing shows some problem with the 3rd zip upload. Not sure what to make of it. Unable to resolve filename cpdnout3.zip </stderr_txt> <message> <file_xfer_error> <file_name>hadsm3fub_k85l_005975356_1_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> I\'d like to get to the bottom of it - esp. if I still have a hidden hardware issue. If a potential hardware issue, I\'d just run the latest version of prime95 on all 4 cores for a day or two. I typically run the blend test since it\'s more memory intensive, and cpdn is certainly memory dependent. |
Send message Joined: 11 May 06 Posts: 4 Credit: 1,008,514 RAC: 0 |
OK - Think I know what happened. I used the local preferences dialogue to limit the number of CPU cores to use to 3. In the windows version of BOINC this dialog defaults many fields - including those on other tabs - and had limited the disk space available to BOINC to 2GiB. Hence no space available to create/retain the file. Turns out I was right on the 2GiB busage border as other projects also take a bite out of this space. I have subsequently upped the limit to 20GiB to prevent this happening again. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi David A couple of your models crashed with -107 errors eg this one. This could perhaps have been caused by a graphics problem completely unrelated to anything else you\'ve mentioned. Have a look in the project READMEs (link in my sig) at the collection about crashes and problems. In that collection, item #6 by Mike and item #7 by Thyme Lawn. Cpdn news |
Send message Joined: 11 May 06 Posts: 4 Credit: 1,008,514 RAC: 0 |
A couple of your models crashed with -107 errors eg this one. Hi, Almost 100% certain that that those were caused by dodgy memory - I swapped out 4 sticks of Balistix non-ECC for 4 sticks of ECC around 21st Jan 09. That WU crashed 9th Dec 08. It\'s a pity that there was no visible alert that something was wrong else I\'d have taken corrective action sooner. Any crashes before that point I\'d like to either run again or markup somehow with an explanation but we don\'t have those facilities. FWIW- A second machine (htpc) also had the same issues but with a completely different graphics card. It too now has ECC RAM and has settled down. One last thought - I also run the SETI cuda apps on this PC - could they interfere? |
Send message Joined: 29 Jan 08 Posts: 1 Credit: 1,126,397 RAC: 0 |
My first Iceworld. I am going to abort it unless you give me a different instructions. 1. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8357089 2. 153306 of 259248, phase 4/4 (14/10/2059) 3. Hours Elapsed: 0353:52:50 (1.37 s/TS) 4. The Blue Planet. Totally. 5. i7 CPU 920 @ 2.67GHz [Intel64 Family 6 Model 26 Stepping 4] HT: On 6. Yes, overclocked to 3.5GHz. However, PC was 24h prime stable (8 instances)at 3.8Ghz so I consider that 3.5GHz is a safe \"production speed\". |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
My first Iceworld. I am going to abort it unless you give me a different instructions.Your task has slowed to about 15 s/TS on the last trickle, from a regular 1.1 s/TS. It\'s partner task on the same WU 8357085 has started fast processing s/TS from the same place in phase 4. Looks like a candidate to abort. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I see you\'ve now aborted that iceworld, Mican. Its whole workunit #6278040 is rather interesting. A model on a machine with Linux crashed at about the same point. The model Hagar mentioned that seems to have speeded up was on a Mac. It crashed. Conan has a model from the same workunit running on Linux and Verstapp has one on Intel/Windows, both further behind. Both Conan and Verstapp are contactable. I\'ll ask them to watch their models to see what happens well into Phase 4. It will be particularly interesting to see whether Conan gets an iceworld on Linux. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
David Raison asked \'One last thought - I also run the SETI cuda apps on this PC - could they interfere?\' I haven\'t seen any reports that using a video card for GPU crunching causes graphics problems in tasks running at the same time on the CPU. I expect BOINC CUDA tasks, like tasks running on the CPU, also run at low priority, reining back the moment the resource is needed for some other job. But thanks for raising the question because it\'s a possibility we need to watch out for. Cpdn news |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
I see you\'ve now aborted that iceworld, Mican. Its whole workunit #6278040 is rather interesting. A model on a machine with Linux crashed at about the same point. The model Hagar mentioned that seems to have speeded up was on a Mac. It crashed. G\'Day mo.v, Just an update, I took a quick look at Verstapp\'s work unit as it has reached phase 4 and I am still at phase 1. The last TS that he uploaded has just taken a large drop from a consistent 1.2 s/TS to 1.33 s/TS so perhaps he has also gone into an Ice world ? |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I bet you\'re right. I don\'t think he can have seen my PM about it on this forum so I\'ll send him another on the independent one where he visits most days. He has a lot of computers and may not be able to look at all his models\' graphics very often. Cpdn news |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
The task 8357088 has gone from consistent 1.2 s/TS to 12.4 s/TS on the last trickle at Phase 4 TS 151228. |
Send message Joined: 1 Feb 09 Posts: 1 Credit: 105,577 RAC: 0 |
Hello, I think i have an iceworld/blue earth problem. I\'ve been running 2 Hadsm3mh models simultaneously, details are: Model 1 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8314591 Timestep 159890 2.30s/TS Temp display is blue Windows Vista Home Premium: Intel Core 2 Quad Q9400@2.66GHz No overclocking Model 2 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8314609 Timestep 49681 2.83s/TS Temp display is blue Windows Vista Home Premium: Intel Core 2 Quad Q9400@2.66GHz No overclocking Any thoughts would be appreciated. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Yep. It looks like both of them are \"iceworlds\". On one of yours, two other Intel/Windows PCs running models from that work unit also slowed down at the same point. On the other, two other Intel/Windows PCs running models from that work unit haven\'t returned a trickle for awhile, after reaching the point that yours slowed down. Best to abort them. Bad luck to get two of them at the same time. Good luck with your next models. |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
Hello, I think i have an iceworld/blue earth problem. ..... Any thoughts would be appreciated. Your task 8314591 has slowed dramatically since 13th Feb 2009. It is from WU 6270500. One task from that WU that has completed: 8314609 on Intel/Linux. Two tasks have slowed dramatically at the same point in phase 4 as yours: 8314585 and 8314586 both on Intel/windows. Your task 8314609 has slowed dramatically since 20th Feb 2009. It is from WU 6270502. One task from that WU that has completed: 8314603 on Intel/Linux. One task from that WU that is near completion: 8314605 on AMD/Vista. One task hase slowed dramatically at the same point in phase 4 as yours: 8314607 on Intel/windows. They both look like candidates to abort. Good luck with the new ones. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Graham, you may like to replace one or both aborted iceworlds with the new type of model, HadAM3P. None of these turn into iceworlds as far as we know. You may need to select this model in the climateprediction section of your account. They\'re described in the forum news thread at the top of Number Crunching. Just don\'t let your computer run 4 of them at any time because too many slow each other down. They run very nicely alongside HadSM and HadSMMH models. Cpdn news |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
|
Send message Joined: 17 Feb 09 Posts: 31 Credit: 1,505,895 RAC: 529 |
Looks Like I have an Ice world. All Blue on the graphics and occured within the last 24 to 48 hrs. A link to the model/ResultID webpage - Task ID is number 7776737 A current timestep of that model (on the globe graphic) - Time step is 08 08 1830 The s/TS value (on the globe graphic. Remember, you can hit the Z key while viewing the globe and it will give you this additional text/status information.) - 1.67 Whether the temperature display of the globe graphic is blue. - Temperature Display is Blue in the -30 to - 36 range What your processor/CPU and Operating System is (i.e. Intel or AMD on Windows or Linux) - Intel(R) Pentium(R) Dual CPU E2180 @ 2.00 GHz, 2.00 GB RAM on Windows Vista Whether you are overclocking. - No Oveclocking. Should this model be continued or aborted? |
Send message Joined: 17 Feb 09 Posts: 31 Credit: 1,505,895 RAC: 529 |
My apologies about the time step. it is 80997 of 259248 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I keep a record of the time that trickles are generated, and if I find that a model has missed the scheduled time of a few trickles and the temp is blue, then I just abort it and get another one. Since upgrading from a P4 to a quad core processor, it\'s just not worth the hassle of restoring a backup for 1 out of 4 models. There are millions of combinations to test. Backups now are only in case I have a power failure or some such. And even then I find that the computer and the models recover OK. With your model, 2 other crunchers appear to have passed where you were, so with a bit of luck one or more of them will run the full length. I\'d suggest aborting it. Backups: Here |
©2024 cpdn.org