climateprediction.net (CPDN) home page
Thread 'Again sec/TS is abnormal rising...'

Thread 'Again sec/TS is abnormal rising...'

Message boards : Number crunching : Again sec/TS is abnormal rising...
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10777 - Posted: 12 Mar 2005, 21:39:02 UTC

Hi,

look at this <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=391890">host and it' trickles</a>...

At the beginning of model this machine needed about 3.17 sec/TS. But the consumed CPU time was rising along model progress. Actually it needed 5.86 sec/TS to complete the 10802TS for the last trickle. One trickle before this it was 5.48sec/TS.

What causes sec/TS to rise so abnormaly? The setup of may machine has'nt changed scince model start.

Thx &amp; Ciao, Tom
ID: 10777 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 10782 - Posted: 13 Mar 2005, 0:16:54 UTC - in response to Message 10777.  
Last modified: 13 Mar 2005, 0:30:19 UTC

I cannot be 100% sure, but the Mobile AthlonXP-M of my Notebook did the same on me end of last year.

I eventually found out the poor thing was overheating and was forced into Thermal Throttling to prevent damage to the CPU, thus the performance was continually degrading over time
(the TS/sec always displays the total average from the beginning to the current point, thus the increase in TS/sec is a very gradual, "sneaky" one)

If it is a Notebook (I'm not aware of any DTR Mobile P4-variants), you might want to check if the CPU is clocking down under full load (after running a while), and if so, thoroughly clean the Notebook's Cooling mechanism (Fan, Heatpipe or alike) plus place it onto a roster or anything else that lifts it further from the ground to help cooling it.

If cleaning by normal household means fails, a Pressure Air Bottle usually helps for harder cases of dust/dirt for the external accessible Cooling components.

Utility to verify CPU Clock in realtime :
CPU-Z : <a href="http://www.falconfly-central.de/downloads/cpu-z-127.zip">http://www.falconfly-central.de/downloads/cpu-z-127.zip</a>
------------------------------------------------
View of the effect of my Notebook overheating :
<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=242786">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=242786</a>
(getting worse over time)

<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=455373">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=455373</a>
(as you can see, the Problem was corrected on 10 Jan 2005, immediately improved performance)
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 10782 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10795 - Posted: 13 Mar 2005, 11:25:23 UTC

I checked the Core Speed after running CPDN on this Notebook for about 4 hours and it runs with 2.4GHz as it should.

A point to mention some trickles ago the model chrashed without reporting error to sheduler and it begann calculating the current trickle from the beginning. But this happend some trickles ago. Now it is working fine despite the increasing calculation time.

I think its not normal that processing speed rises across model from 3.17 to 5.86. Theres something wrong.
ID: 10795 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 10805 - Posted: 13 Mar 2005, 17:28:06 UTC - in response to Message 10795.  
Last modified: 13 Mar 2005, 17:30:48 UTC

Quite odd indeed.

The only things you could check for :

- Memory leaks (check for total amount of RAM usage and/or spurious HD activity)

- other CPU-Intensive background tasks
(Task Manager will reveal which processes took what percentage of CPU time; CPDN should always be around 99%, especially Software like AntiVirus scanners is known to hog down BOINC once a while; setup to exclude the BOINC Directories from background scanning if that's the case)

- Screensaver
(if GUI is installed) should not be used; the OpenGL Graphics might draw alot of CPU power away ; have Display shutdown via Energy Management instead, further reduces Power consumption and heat generation originating from the Video Chip)

- LogOn/LogOff policies can sometimes disturb running Programs if the System automatically logs off (requires password to log-in again) after a predefined amount of time. If feasible, have the System never automatically Logoff, should always remain fully logged in.
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 10805 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10806 - Posted: 13 Mar 2005, 18:09:34 UTC - in response to Message 10805.  

&gt; - Memory leaks
System provides 448MB Ram (cause of sharing with video) and hdasm3um is currently using about 55MB. The paging file seems to be normal with about 244MB.

&gt; - other CPU-Intensive background tasks
No other tasks with permanently significant CPU loads. Hadsm3um gets about 97 to 99 perc.

&gt; - Screensaver
Gui is installed cause of getting more flexibility. No screensaver is running. Display is mostly shutdown. No hadsm visualization is used.

&gt; - LogOn/LogOff policies
System is always logged in.

For more informations about the host <a href="http://microtoxic.ath.cx/cpuz.htm"> look at detailed output of cpuz</a>

Thx &amp; Ciao, Tom
ID: 10806 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 10817 - Posted: 13 Mar 2005, 23:32:09 UTC - in response to Message 10806.  

Everything looks good, if there's nothing stealing CPU cycles I'm afraid I'm out of Ideas what could cause the slowdown (except an unusual Model run, I've seen some Models slow down 'somewhat', the further they went into the computation)
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 10817 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 10830 - Posted: 14 Mar 2005, 6:55:02 UTC

On a classic board (where the hell is it?), he had a LONG thread about such a problem. Many of CPDN users and regulars had plenty of ideas of what to look for (much more then here). None of them applied.
Finally, we got this misterious thing solved. IIRC, i proposed a antivir issue or something, which finally got the right point. No classic board, no joy, heh?
As your model gradually slow in progress, it may be correlated with disk fragmentation. I would suspend/exit model and let the defragmentation run...

<i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a>
ID: 10830 · Report as offensive     Reply Quote
old_user2147

Send message
Joined: 27 Aug 04
Posts: 55
Credit: 1,106,201
RAC: 0
Message 10838 - Posted: 14 Mar 2005, 8:12:29 UTC

FWIW - I see you're running a Northie, which is a destop P4. AFAIK, CPU-Z will NOT detect TM1 thermal throttling in a desktop P4. I know it doesn't in my desktop P4's.

<a href="http://www.panopsys.com/throttlewatch.php">Panopsys' "ThrottleWatch"</a> is what you want, when checking for P4 desktop TM1 throttling.

HTH

GL

Strat
ID: 10838 · Report as offensive     Reply Quote
old_user2147

Send message
Joined: 27 Aug 04
Posts: 55
Credit: 1,106,201
RAC: 0
Message 10840 - Posted: 14 Mar 2005, 8:27:54 UTC

microtoxic -

I was gonna' edit my earlier post, but decided to just post another. When running DC projects on a P4, you can easily spot TM1 throttling by settting up your tskmgr to show both cpu usage &amp; kernal usage, and knowing what to look for.

If you're interested, post here, &amp; I'll respond w/the details.

"ThrottleWatch" should be all you need, but I don't use it since I'm pretty good at spotting thermal throttling on the tskmgr.

Strat
ID: 10840 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10842 - Posted: 14 Mar 2005, 9:22:21 UTC
Last modified: 14 Mar 2005, 9:23:57 UTC

Thx for your answers!

@Stratcat:
I didn't know of this thermal issue. I thougt the thermal management only reduces CPU frequency. Ok. I donwloaded ThrottleWatch as you mentioned and run it on my p4 notebook. The graphs are showing that cpu frequency is normal at 2400Mhz but TM1 Throttling throws about 0 to 50%.

But I don't know whether this happened also several days ago as there the sec/TS where still normal.

How can I see thermal throttling with the normal win taskman?

@Honza
I run defragmentation but without seeing any heavy fragmentation. Additionally I excluded the Boinc folder from virusdetection, but this was'nt the case as processing speed where still normal. Maybe its now running a little bit faster.

@all
Look again at the <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=391890">Trickle History</a> of this host.
It reported one trickle yesterday evening with an innertrickle processing speed of 4.98 sec/TS. The innertrickle speed one trickle before was 5.86 much higher. Maybe due to thermal throttling. But maybe due to something else?

Thx, Tom
ID: 10842 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 10848 - Posted: 14 Mar 2005, 12:24:02 UTC - in response to Message 10842.  
Last modified: 14 Mar 2005, 12:25:22 UTC

...deleted...
ID: 10848 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10868 - Posted: 14 Mar 2005, 17:11:02 UTC

Well...

I made some additional calculations about history:

<pre>
Phase Timestep CPU Time innertrickle speed
2 151228 1464633 4,155156453
2 140426 1419749 4,960099981
2 129624 1366170 5,861692279
2 118822 1302852 5,478707647
2 108020 1243671 5,259026106
2 97218 1186863 4,968154046
2 86416 1133197 3,721347899
2 75614 1092999 4,060451768
2 64812 1049138 3,193390113
2 54010 1014643 3,121088687
2 43208 980929 3,772079245
2 32406 940183 3,959081652
2 21604 897417 3,119144603
2 10802 863724 1,301610813
1 259248 849664 4,362988335
1 248446 802535 3,674041844
1 237644 762848 3,658489169
1 226842 723329 3,579707462
1 216040 684661 3,504813923
1 205238 646802 3,586835771
1 194436 608057 3,511571931
1 183634 570125 3,147380115
1 172832 536127 3,155989632
1 162030 502036 3,121088687
1 151228 468322 3,200240696
1 140426 433753 3,081466395
1 129624 400467 3,048879837
1 118822 367533 3,182095908
1 108020 333160 3,134419552
1 97218 299302 3,065358267
1 86416 266190 3,055545269
1 75614 233184 3,165154601
1 64812 198994 3,084151083
1 54010 165679 3,050546195
1 43208 132727 3,071005369
1 32406 99554 3,173949269
1 21604 65269 2,987317163
1 10802 33000 3,054989817
</pre>

As one can see the processing speed of every 10802 TS at phase 1 is very stable. Only at phase 2 the model is claiming more and more CPU time with an additional need of 50% at average and 85% at the highest compared to the whole average.

Can't this be explained with normal model behavior?

Ciao, Tom
ID: 10868 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 10903 - Posted: 15 Mar 2005, 7:47:28 UTC

Any extreme climate?
Model rewinds?
<i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a>
ID: 10903 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 10961 - Posted: 15 Mar 2005, 20:00:42 UTC - in response to Message 10903.  
Last modified: 15 Mar 2005, 23:09:36 UTC

&gt; Any extreme climate?
No, it doesn't looks like.

&gt; Model rewinds?
Some trickles ago the notebook crashed. After this I suspected cpdn rewinded to beginning of the actual trickle. But I'm now not sure.

ID: 10961 · Report as offensive     Reply Quote
old_user2467

Send message
Joined: 28 Aug 04
Posts: 90
Credit: 2,736,552
RAC: 0
Message 11309 - Posted: 23 Mar 2005, 10:15:02 UTC

Mystery resolved.

Hi @all!

The mystery of increasing sec/TS has resolved. The tips of thermal throttling and of insufficiently cooling where both helpfull. I opened the case of my noteboook and looked at the cpu cooler. From the outside of the case it looked normal and not contaminated with to much dust. But looking at the cooler from inside the case revealed the problem. The transition from the cooler channel to the heatsink was clogged with dust. So the air couldn't reach the heatsink and cpu was getting to hot. This caused the thermal throttling feature to intervene.

The resulting glaringly difference between measuered benchmark results before and after is amazing.

Before cleaning: Whetstone 866
Dhrystone 2579

After cleaning : Whetstone 1245
Dhrystone 3665

Thanks for all your replys and best regards!
Tom
ID: 11309 · Report as offensive     Reply Quote

Message boards : Number crunching : Again sec/TS is abnormal rising...

©2024 cpdn.org