climateprediction.net (CPDN) home page
Thread 'HadSM3 progressing at snails pace'

Thread 'HadSM3 progressing at snails pace'

Message boards : Number crunching : HadSM3 progressing at snails pace
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user142879

Send message
Joined: 23 Dec 05
Posts: 2
Credit: 254,498
RAC: 0
Message 40070 - Posted: 3 Jul 2010, 5:57:00 UTC

Hi there,

I am running "UK Met Office HadSM3 Slab Model 6.07" in one of the cores of my Lenove C2Q/8200 machine.

OS: WinXP/SP3

BOINC is running a service so I don't see any graphics.

As per your advice in the forum I ran a complete intensive diagnostic of the PC using the Lenove Thinkvantage Toolbox and it came OK.

Currently it runs the WU hadsm3dhet2_jutz_006602953_5.

50 hours ago it was at:
Elapse time: 348:48:18
Progress: 89.149%
To complete: 42:23:58

Now it is at:
Elapse time: 397:59:18
Progress: 89.544%
To complete: 46:25:03

In 49 hours of computation it advance 0.395%, if my calculation are right it will around 1300 hrs more to complete.

Is this a normal behaviour for this application ?

As it already computed for ~400 hours I will hate to abort it.

I will appreciate any advice.

Regards,

Yair
ID: 40070 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40071 - Posted: 3 Jul 2010, 7:54:05 UTC - in response to Message 40070.  

The sec/TS has jumped suddenly, so it's probably gone "ice world".

I don't think that you have much choice:
1) Continue for, possibly, the rest of the year, with the chance of faulty results.
2) Abort it.

As per the News thread, that model type has been retired, so only FAMOUS at the moment, (the Millennium model), with three varieties of a new type in beta testing.


Backups: Here
ID: 40071 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,860,147
RAC: 4,891
Message 40072 - Posted: 3 Jul 2010, 7:54:37 UTC
Last modified: 3 Jul 2010, 7:55:44 UTC

Yair,

Welcome to CPDN message board.

From the record of that computer it's apparent that you've successfully run all the slab (HADSM3) and mid-Holocene (HADSM3MH) models you've downloaded to completion - so the problem isn't likely to be the computer. The most likely explanation is that the model has become a slow-processing 'iceworld'; depending on model batch, up to 15% become iceworlds.

The model will eventually complete, but the the rate of trickle submission might decline from, say, one every few hours to one every week. Unless the model is very close to the end, which yours is not, then aborting it is the only option. The computer can then get on with some more useful work - there's a FAMOUS model already downloaded on that machine ready to go.

Iain

PS If you use the message boards 'advanced search' facility to look for the word 'iceworld' over the last year, you'll find some other relevant threads.

[Edit: Oops - Les got there first.]
ID: 40072 · Report as offensive     Reply Quote
Profileold_user142879

Send message
Joined: 23 Dec 05
Posts: 2
Credit: 254,498
RAC: 0
Message 40074 - Posted: 3 Jul 2010, 11:11:55 UTC - in response to Message 40072.  

Thanks Iain,

I will probably abort it in a while.

The WU famous_r141_1599_200_006666156_1 listed in my account as being in progress on my computer is not here, so you might want to make it available for download again.

In my project preferences I allowed all experiments, I just checked and they are all still there, available, including the Hadsm3 that Les said is retired.

Would you like me to select only few of the experiments ?

Regards,

Yair
ID: 40074 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 40076 - Posted: 3 Jul 2010, 13:34:32 UTC

Dear Ubdaddy:

Check the “server status” page. You will see that next to all model types except the “famous” model the number ready to send is 0.

Just think, now that the SM’s are retired we will no longer have these interesting discussions about “ice worlds”.

ID: 40076 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40077 - Posted: 3 Jul 2010, 14:59:03 UTC
Last modified: 3 Jul 2010, 15:04:37 UTC

As Les said, the model's speed, or sec/timestep, has jumped dramatically. Here's the model. The situation is far worse than the impression given by the last value which is a cumulative average.

Another computer in the same workunit has completed the model without problems, but it has Linux whereas you have Windows. If an iceworld develops it's almost always on every model in a workunit with the same operating system.

There's another computer in the WU that, like yours Ubdaddy, also has Intel + Windows. This computer's model is less advanced than yours but will probably become an iceworld at exactly the same processing moment. We'll ask our programmer Milo to send the owner an email to warn him.

Don't let your computer waste more time on this model. If you let it battle on for weeks and weeks, data will probably be missing from its graphs from the moment when the iceworld developed. Abort it. Now.

Thank you for reporting the problem.
Cpdn news
ID: 40077 · Report as offensive     Reply Quote
old_user626520

Send message
Joined: 23 Jun 10
Posts: 2
Credit: 13,893
RAC: 0
Message 40091 - Posted: 8 Jul 2010, 14:59:33 UTC - in response to Message 40077.  

Hello. I found this post searching for the first several characters of the work unit I'm questioning, which I am currently running, and wondered if maybe theres a correlation to my problem?

Here's the info of the unit I have a question about:


7/8/2010 7:56:33 AM climateprediction.net Restarting task hadsm3dhet2_js14_006599322_2 using hadsm3 version 607

When I'm running for a while, I would return to the system showing a black screen with the taskbar showing on the bottom, and several instances of this unit showing, with the windows comment "not responding". I'd also noticed that lately when I would first see the screensaver graphic, the globe was peculiarly without any atmosphere, yet having what seemed like low-lying fog.

Is that what you mean by the calculations turning into an iceworld? Is that whats causing the hang on the screensaver graphics? I didnt want to just close them as is, and each time have resorted to full system restarts; I'm hoping that thats the safest manner to handle the number crunching without data loss? ...a graceful shutdown/restart?

Thanks. And, please bear with me, I've just begun participating, and this is my first post.

-DP
ID: 40091 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40093 - Posted: 8 Jul 2010, 17:24:53 UTC

Hi Zerepelad, welcome to the forum. Don't worry about being a newbie; if more people posted when they're not sure what's going on, more problems would get sorted out.

Here's the web page for hadsm3dhet2_js14_006599322_2. I see no sign that it's become an iceworld.

There's a description by Geophi of what 'iceworlds' are like here. Your computer has AMD and Windows; with this combination, if the model did turn into an iceworld you would expect to see the processing suddenly speed up, not slow down. The temperature view of the model's globe would show a complete blue circle. The other views eg pressure would also show just one colour, not the usual moving picture of the weather as it's produced. So if you look at the globe graphics from time to time you'll see whether your model is still progressing normally.

It's getting near the end of Phase 1 of the 3 phases. At the end of each phase it will produce a file to upload. While it's post-processing this file and for 10 or 15 minutes afterwards try not to disturb the model by suspending it or exiting from Boinc. These HadSM models don't like their file-processing to be interrupted.

You said:

When I'm running for a while, I would return to the system showing a black screen with the taskbar showing on the bottom, and several instances of this unit showing, with the windows comment "not responding".


I'm not sure what you mean by several instances showing. It would help if you could describe what happens in more detail please. Is your Boinc manager open when this happens?

I'd also noticed that lately when I would first see the screensaver graphic, the globe was peculiarly without any atmosphere, yet having what seemed like low-lying fog.


I think you may be seeing the Clouds view which can look rather foggy.

If you're often getting this Windows 'nor responding' message it probably means that web pages are freezing. This happens to almost everybody from time to time but if it's a frequent problem it may be that your computer isn't very happy running the screensaver which you probably have set up to kick in when the computer's been left for a while. The screensaver graphics are rather resource-intensive because they're dynamic, constantly changing; for this reason they slow down the processing of the model. A computer of mine often froze until I disabled the screensaver.

To disable the screensaver:

Right-click on a blue area of the desktop
Select Properties in the menu
In the Display Properties pane choose the Screensaver tab
In the Screensaver drop-down menu select None or a static picture
Click the OK button

You can still view your globe whenever you want by clicking on the View Graphics button in your Boinc manager. Because the globe window viewed this way isn't full-screen like the screensaver, computers don't seem to mind it. Still not a good idea to leave the globe window open all the time though.

See whether that suggestion helps.
Cpdn news
ID: 40093 · Report as offensive     Reply Quote
old_user626520

Send message
Joined: 23 Jun 10
Posts: 2
Credit: 13,893
RAC: 0
Message 40315 - Posted: 6 Aug 2010, 10:25:22 UTC - in response to Message 40093.  

Thank you!
ID: 40315 · Report as offensive     Reply Quote

Message boards : Number crunching : HadSM3 progressing at snails pace

©2024 cpdn.org