climateprediction.net (CPDN) home page
Thread 'Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion'

Thread 'Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion'

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 15 · Next

AuthorMessage
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 32680 - Posted: 19 Feb 2008, 23:51:57 UTC - in response to Message 32670.  

... WU finished and returned at lunchtime UTC today 19/02/2008.
Hope the results are useful to the scientists!...

Looks very interesting: I\'ve not see one decay like that ...
ID: 32680 · Report as offensive
[B^S] sTrey
Avatar

Send message
Joined: 9 Jan 05
Posts: 30
Credit: 434,469
RAC: 0
Message 32711 - Posted: 23 Feb 2008, 0:17:36 UTC
Last modified: 23 Feb 2008, 0:27:45 UTC

I think this qualifies though it\'s not a hadsm3, it\'s a hadcm3 optimised IO.
Result7006672
Timestamp 09:30 20/12/2056
s/TS shown as 2.83
blue globe
Intel P4 (Prescott) HT 3.0, not overclocked

This box has been my CPDN workhorse, though it\'s the first time it\'s tried one of these optimised I/O coupled models. This is an 80-year model 70% finished.

I only noticed after a day or so that no trickles had been sent. When I looked at it, the graphics window showed only black. I suspended, quit, zipped a backup of the entire boinc directory and restarted. The display began as normal but was unresponsive and eventually went from a black world to a blue one. The the 2.83 sec/TS it reports is no longer real, it\'s more like 10 minutes/TS.

Any reason not to abort it, or anything else I should be looking at? I\'ve no indications the hardware is ailing, haven\'t changed hardware or software recently, etc.
ID: 32711 · Report as offensive
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 32712 - Posted: 23 Feb 2008, 1:30:25 UTC


The HadCM3 generally can\'t turn into an iceworld due to the way the ocean is modelled. I\'d therefore suspect that something else is involved (perhaps a rewind). Could you try rebooting the PC and then checking the %complete and so forth?

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 32712 · Report as offensive
[B^S] sTrey
Avatar

Send message
Joined: 9 Jan 05
Posts: 30
Credit: 434,469
RAC: 0
Message 32713 - Posted: 23 Feb 2008, 2:42:41 UTC
Last modified: 23 Feb 2008, 3:17:01 UTC

Sure. data after reboot:

Timestamp 00:30 19-Dec-2056
70.05%done
2.82 s/TS
1140 hours elapsed
Globe is orange again.

So far it\'s now counting down normally, but there\'s about a model-day to go before it gets to where it was stuck before.

I\'m curious what rebooting changed that restarting boinc did not.

Anyway since this is apparently the wrong thread, I\'ll look for a better location for subsequent questions (clues happily accepted).

-edit- it\'s stuck again same place, 09:30 20-Dec-2056, blue globe and crawling.
ID: 32713 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 32715 - Posted: 23 Feb 2008, 3:09:52 UTC
Last modified: 23 Feb 2008, 3:17:51 UTC

I\'m not sure if there IS a better thread.
And I seem to remember getting a blue world in a HadCM3 model, but it was in another lifetime, and I don\'t remember the details. Or late last year, which is much the same thing with my memory. :)

ID: 32715 · Report as offensive
[B^S] sTrey
Avatar

Send message
Joined: 9 Jan 05
Posts: 30
Credit: 434,469
RAC: 0
Message 32716 - Posted: 23 Feb 2008, 3:11:15 UTC

OK well this one is repeatably stuck (see the last edit on my previous post), is there anything useful I can do with it or just abort it?
ID: 32716 · Report as offensive
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 32724 - Posted: 23 Feb 2008, 11:35:37 UTC


Sounds like aborting it is the only alternative, sorry...

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 32724 · Report as offensive
old_user484002

Send message
Joined: 23 Nov 07
Posts: 9
Credit: 325,662
RAC: 0
Message 32911 - Posted: 11 Mar 2008, 6:41:51 UTC

Seems I got an iceworld at quite early stage of modelling hadsm3 slab:

ResultID 7291577

A current timestep of the model: 131087 of 259248

The s/TS value: 3.26

The temperature display of the globe graphic is uniformely blue, P gives uniform black, R - uniform pale-biue, no clouds at all.

Processor: 1 GenuineIntel Intel(R) Celeron(R) D CPU 3.06GHz [x86 Family 15 Model 6 Stepping 4]
Processor features: fpu tsc sse sse2 mmx

Whether you are overclocking - seems, not.

Should I try to proceed or abort?
ID: 32911 · Report as offensive
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 32912 - Posted: 11 Mar 2008, 7:56:28 UTC


If the s/ts is getting worse and worse, then yes, abort it. Running a stability check (such as prime95 for 24 hours) helps to identify whether it is the hardware or the software causing the trouble, although if it happens just a few times it is usually the model. If it happens a lot, and you also get unexplained model crashes, then it is worthwhile running the test.


Note that it is only the HadSM3 model which suffers from slow-running iceworlds, so perhaps running one of the other model types may be less frustrating if you keep getting issues with slabs :-)

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 32912 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 32913 - Posted: 11 Mar 2008, 13:33:57 UTC - in response to Message 32912.  


If the s/ts is getting worse and worse, then yes, abort it. Running a stability check (such as prime95 for 24 hours) helps to identify whether it is the hardware or the software causing the trouble, although if it happens just a few times it is usually the model. If it happens a lot, and you also get unexplained model crashes, then it is worthwhile running the test.

My hardware ran Einstein@home just fine, but with these climate models I have had some equipment freezes (but no ice world freezes, thank goodness! ;), and have had to throttle back my overclocking several times. The freezes are less and less frequent with each throttle back, and hopefully my latest larger throttle back will work on a long-term basis! ;)
Regards,
Bob P.
ID: 32913 · Report as offensive
old_user484002

Send message
Joined: 23 Nov 07
Posts: 9
Credit: 325,662
RAC: 0
Message 32914 - Posted: 11 Mar 2008, 14:05:55 UTC - in response to Message 32912.  


If the s/ts is getting worse and worse, then yes, abort it. Running a stability check (such as prime95 for 24 hours) helps to identify whether it is the hardware or the software causing the trouble, although if it happens just a few times it is usually the model. If it happens a lot, and you also get unexplained model crashes, then it is worthwhile running the test.


Really interesting effect: s/TS is now 3,43 and the timestep is counting DOWN! (131068 vs 131087)

Can I do anything else to investigate it?

It\'s the firs time happening, so hardware problem is unlikely.

ID: 32914 · Report as offensive
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 32915 - Posted: 11 Mar 2008, 14:23:20 UTC - in response to Message 32914.  
Last modified: 11 Mar 2008, 14:33:02 UTC

[ask_spb wrote:]... Can I do anything else to investigate it? ...

One other thing is to watch the progress of similar PCs in that work unit: 6143336.

So far, one PC has gone further but it\'s Linux. When these slabs slow down, a trickle may take as long as a week - I suspect a few more people in that work unit will start to notice something wrong soon.

[Edit: The guy above you in the work unit list is the one to watch as he\'s got the fastest machine.]
ID: 32915 · Report as offensive
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 32919 - Posted: 11 Mar 2008, 16:54:35 UTC

Sometimes restoring a backup from before the slowdown started is worthwhile. If you do this you need to keep an eye on the graphics to see whether the same thing happens again. If it does, the only sensible solution is to abort.

If the timestep goes back, the model could also perhaps be looping.

Re overclocking. When CPDN ran its first Classic slabs, one of the then programmers in Oxford posted on the forum that about 2½% of completed models were failing quality control ie could not be used by the researchers even though the models had earned their credits. He added that one of the main causes was over-enthusiastic overclocking. Quality control is still carried out.

(Irrelevant to this thread but interestingly, he also said that differences between AMD/Intel and OSs led to only insignificant differences in model outcomes.)
Cpdn news
ID: 32919 · Report as offensive
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 32920 - Posted: 11 Mar 2008, 17:00:07 UTC
Last modified: 11 Mar 2008, 17:11:01 UTC

John Hunt persevered and completed a slow iceworld. Look at the graphs:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7209853

Chris Beaugrand also persevered with the same workunit. Again, look at the graphs:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7209854

Another member abandoned the same model after the slowdown started. It must be concluded that the model itself was unviable.

But if you find that somebody else has got beyond your slowdown point without a problem, you need to consider whether your computer could perhaps be slightly unstable.

Cpdn news
ID: 32920 · Report as offensive
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 32922 - Posted: 11 Mar 2008, 18:42:38 UTC - in response to Message 32913.  

...
My hardware ran Einstein@home just fine, but with these climate models I have had some equipment freezes (but no ice world freezes, thank goodness! ;), and have had to throttle back my overclocking several times. The freezes are less and less frequent with each throttle back, and hopefully my latest larger throttle back will work on a long-term basis! ;)


For overclocking, I\'d recommend a full 24 hours of Prime95 before running the climate model (one copy running for each core you have - so on a quad core you\'d have 4 copies, using -A0, -A1, -A2, and -A3). It took a lot of work before I could get my Q6600 working properly overclocked.



I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 32922 · Report as offensive
old_user428438

Send message
Joined: 1 Feb 07
Posts: 26
Credit: 885,216
RAC: 0
Message 32924 - Posted: 11 Mar 2008, 19:52:48 UTC - in response to Message 32922.  

...
My hardware ran Einstein@home just fine, but with these climate models I have had some equipment freezes (but no ice world freezes, thank goodness! ;), and have had to throttle back my overclocking several times. The freezes are less and less frequent with each throttle back, and hopefully my latest larger throttle back will work on a long-term basis! ;)


For overclocking, I\'d recommend a full 24 hours of Prime95 before running the climate model (one copy running for each core you have - so on a quad core you\'d have 4 copies, using -A0, -A1, -A2, and -A3). It took a lot of work before I could get my Q6600 working properly overclocked.



@Mike: V25.5 of Prime95 automatically loads all the cores that you have - so no need to run multiple instances.

F.
ID: 32924 · Report as offensive
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 32925 - Posted: 11 Mar 2008, 21:00:38 UTC - in response to Message 32924.  

...
@Mike: V25.5 of Prime95 automatically loads all the cores that you have - so no need to run multiple instances.

F.


True, but the front page of http://www.mersenne.org only offers 24.14 (that being the last official release), and I\'m reluctant to point people towards the alpha / \'pre-beta\' versions in the forums...

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 32925 · Report as offensive
old_user484002

Send message
Joined: 23 Nov 07
Posts: 9
Credit: 325,662
RAC: 0
Message 32930 - Posted: 12 Mar 2008, 14:22:05 UTC - in response to Message 32915.  

[ask_spb wrote:]... Can I do anything else to investigate it? ...

One other thing is to watch the progress of similar PCs in that work unit: 6143336.

So far, one PC has gone further but it\'s Linux. When these slabs slow down, a trickle may take as long as a week - I suspect a few more people in that work unit will start to notice something wrong soon.

[Edit: The guy above you in the work unit list is the one to watch as he\'s got the fastest machine.]


My model definitely got looped, as now it shows again timestep 131070 (was before 131087, 131069, 131068). s/TS is growing - 4,06. Seems all Windows simulations got looped on the same timestep - somewhere after 129,624, while Linux simulation passed it successfully.

So I will better try another task :)
ID: 32930 · Report as offensive
old_user511233

Send message
Joined: 7 Apr 08
Posts: 4
Credit: 28,086
RAC: 0
Message 33279 - Posted: 10 Apr 2008, 20:28:48 UTC

I haven\'t a clue if this is the place to ask this question so here goes anyway. I downloaded and installed the BOINC software and attached the climate prediction stuff about a week ago. The computer has accumulated about 23 hours of CPU time but the progress percentage is only about 0.56%. At this rate it will take years to finish the work package and I doubt that the computer will be running 6 months from now.

It\'s a Windows 2000 system, 2.8GHz, 2G-main memory, w/ATI x1300 graphics card.

In the graphics view I get what looks like a reasonable (from my uninformed eye) distribution of temperature, pressure and the like. I don\'t use the screensaver and don\'t usually have the graphics screen active. The graphics display updates smoothly.

Is this a typical CPU usage vs. progress percentage for this speed computer? If it is I\'ll have to terminate the climate prediction task as it will be all wasted analysis anyway.

Any comments will be appreciated.
ID: 33279 · Report as offensive
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 33280 - Posted: 10 Apr 2008, 20:38:37 UTC

I\'m no expert, Richard, but it looks as if you\'ve downloaded two models instead of one - that would probably slow things up quite a lot? http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=854815
Visit the Scotland team
ID: 33280 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 15 · Next

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

©2024 cpdn.org