Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next
Author | Message |
---|---|
Send message Joined: 5 Jan 05 Posts: 4 Credit: 1,544,444 RAC: 0 |
Hi again, another unit turned into iceball I suppose. Within 19 days only 9.8% finished and TS growing. 8233928 |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Two others in that WU, both Intel Pentium 4s or later with Windows, are also in trouble at the same point. A Pentium 3 in Windows has gone past that point and seems to be doing well (as well as a Pentium 3 can do at 6 s/TS). |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I\'ll send private messages to the people with that workunit who need to be warned. (Not that I\'ve ever received a response to this sort of message, of which I must have sent dozens.) Cpdn news |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Another frozen earth here, which the user has aborted. Two other Intel/Windows computers in that work unit are nearing the freeze point. Unfortunately, I am unable to PM one of them...not that PMs do much good in these situations since, by default, no one is ever notified of a PM via e-mail. |
Send message Joined: 14 May 08 Posts: 29 Credit: 776,852 RAC: 0 |
Another Ice Age Could not get it restarted and I had not made a backup, so I aborted it. Also a \'dirty\' power interruption caused another host to crash 3 models, #1 , #2 , #3. All three reported immediately on computer restart. Unfortunately I had been doing some rearranging and accidently left that host off the UPS, now fixed but too late for those 3 and some Seti WU\'s. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Unless you have suspicions that an iceworld was caused by an instability in the computer, it isn\'t worth restoring even if you have a backup. They almost invariably crash again at the same point. There\'s another Windows machine in the same WU which will probably hit iceworld conditions within the next trickle or two so I\'ll send its owner a PM. Cpdn news |
Send message Joined: 16 May 07 Posts: 2 Credit: 0 RAC: 0 |
I am posting to report that hadsm3mh_kl60_006003796_8 went iceworld at 97.843% completion. Other information provided because another post said it was providing this requested information: 1. A link to the model/ResultID webpage http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8132659 2. A current timestep of that model (on the globe graphic) 236881 of 259248 Date 16/08/2064 00:30 Sorry, not sure when last trickle was. 3. The s/TS value (on the globe graphic. Remember, you can hit the Z key while viewing the globe and it will give you this additional text/status information.) Hours Elapsed: 0866:51:21 (3.08 s/TS) 4. Whether the temperature display of the globe graphic is blue. Yes. Entirely. 5. What your processor/CPU is (i.e. Intel, AMD) GenuineIntel Intel(R) Pentium(R) 4 CPU 3.00GHz [x86 Family 15 Model 4 Stepping 1] [fpu tsc pae nx sse sse2 mmx] 6. Whether you are overclocking. No. Question: About how long should I expect the remaining 2.2% to take to complete? Thanks |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
... Question: About how long should I expect the remaining 2.2% to take to complete?The last trickle was timestep 226,842 in phase 4, so there are three timesteps to go. The machine was processing at about 2 seconds/timestep (after the other model had finished). Based on another HADSM3MH model that went iceworld, it might take 10-11 days for each trickle to complete. So, that would be about a month for the final three trickles! That\'s why most people who are aware of an iceworld just abort it, since they could finish a number of entire models in the time it takes to finish one iceworld. However, it will eventually get to the end if you let it run. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I think you should abort it. When an iceworld develops, as far as we know the model always stops producing some of the data. Usually iceworlds stop producing the precipitation graphs. So the researchers can no longer use your model. Cpdn news |
Send message Joined: 16 May 07 Posts: 2 Credit: 0 RAC: 0 |
The last trickle was timestep 226,842 in phase 4, so there are three timesteps to go. The machine was processing at about 2 seconds/timestep (after the other model had finished). I think you should abort it. When an iceworld develops, as far as we know the model always stops producing some of the data. Usually iceworlds stop producing the precipitation graphs. So the researchers can no longer use your model. Though it seems quite the shame lose all that computing time, if the results aren\'t usable when they are finished then there isn\'t much sense in waiting around on them. Thank you both, -Mike |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Mike, if you look at the model\'s web page and inspect the graphs, you\'ll see that they were excellent for the first three phases. It\'s only from the point when the iceworld develops that the model stops processing its data correctly. So only your crunching for the last phase was lost. My apologies for not making that clear before. Cpdn news |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The phase 4 trickle history for Mike\'s task exhibits a significant speed up before it slowed down. I\'ve not spotted that pattern before. After the 8th trickle the average sec/TS started falling, from 3.1156 to 2.9757. That works out at 2.1150 sec/TS over the last 13 trickles, approximately a third faster than before. The 12th trickle took an average of 1.9839 sec/TS and the 13th 2.3502 sec/TS, suggesting that the slowdown started a bit before that trickle. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
... that\'s just because the other task on the hyperthreaded 3 GHz P4 finished at that point. There appear to be two \'user\' accounts: the relevant one is here. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I believe my task 25/01/2009 22:34:28 hadsm3fub_k8n4_005975987_5 using hadsm3 version 607 has, I think, gone to ice world. It\'s reached 6/8/1824 and the globe is now wholly blue. My other model (coupled) is happily displaying the usual sunshine and clouds. I\'m planning to abort, but not today! |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Yes, there\'s one model further on and it\'s slowed down at the same point. I\'ll send a message to the other people in that unit, advising them to abort. Thanks for reporting that. |
Send message Joined: 3 Jan 09 Posts: 9 Credit: 633,446 RAC: 0 |
I am posting to report that hadsm3mh_kj7q_006010388 went iceworld at 99.371% completion. It is still running, but extremely slowly. BOINC Mgr is estimating 5 hrs to completion, but by my calculations it\'ll be closer to 6 days. Using a previous example of reporting an Ice World: 1. A link to the model/ResultID webpage http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=8284151 2. A current timestep of that model (on the globe graphic) 252734 of 259248 Date 16/07/2065 07:30 3. The s/TS value Hours Elapsed: 0530:27:16 (1.85 s/TS) - per the last trickle the s/TS had been 1.7036 on 2/5/09 at 00:44:46 UTC. 4. Whether the temperature display of the globe graphic is blue. Yes. Entirely. 5. What your processor/CPU is (i.e. Intel, AMD) GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [x86 Family 6 Model 15 Stepping 7] 6. Whether you are overclocking. Yes to 2.70GHz. --------------- Thanks, Jack |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Jack Thanks for reporting this iceworld. That\'s very frustrating, particularly so near the end. The graphs for the first 3 phases all look good. I see you\'ve aborted it now which was the right thing to do. A Mac has completed a model from the same WU but we know that iceworlds are far more common on Intel/Windows computers. So comparing your model with the one on the Mac tells us nothing useful. The only model in that WU that might give you a clue about whether your model was inherently unstable or the iceworld could have been caused by an instability in your computer is this one, also running on Intel/Windows. But that computer\'s sending in trickles so infrequently that you won\'t get an early answer. If your computer produces several iceworlds you could consider testing it for stability at that level of O/C if you haven\'t already done so. Or you could select non-HADSM model types for it. Cpdn news |
Send message Joined: 14 May 08 Posts: 29 Credit: 776,852 RAC: 0 |
I wouldn\'t hold my breath, all previous models by that host are compute error. |
Send message Joined: 11 May 06 Posts: 4 Credit: 1,008,514 RAC: 0 |
I\'ve not had much luck with these models... First off it seems I had some dodgy memory that corrupted the first few models I crunched (replaced 19/1/09 with ECC). Now I have a possible \'ICE\' planet :( Task ID: 7736409 Name: hadsm3fub_k95r_005976658_2 Workunit 6188835 Should I abandon this one? |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,428,734 RAC: 10,431 |
Should I abandon this one?7736409 Your model seems to have speeded up by about 20% on 10th Feb between TS 216040 and TS 226842 in phase 1. trickles. It went from about 1.09s/TS to 0.823s/TS, then to 0.7s/TS at your last trickle phase 2 TS 140,246 today at 11:01. One other user has run the WU to completion, under Darwin, with consistent timing. Three others seem to have also errored, though none as far as yours. Unless you have a backup prior to the 10th I\'d suggest you abort it. Good luck with the next model. |
©2024 cpdn.org