climateprediction.net (CPDN) home page
Thread 'processors, memory, performance and heat.'

Thread 'processors, memory, performance and heat.'

Message boards : Number crunching : processors, memory, performance and heat.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71403 - Posted: 7 Sep 2024, 11:32:31 UTC - in response to Message 71402.  
Last modified: 7 Sep 2024, 11:33:26 UTC

There's a worrying article about recent Intel chips in this month's UK edition of PC PRO magazine (no. 361, cover date October 2024). It says that "... There have long been reports of instability with Intel's "Raptor Lake" processors, with systems crashing under heavy load, and even dead chips." Apparently, it appears to mainly affect the bigger i7 and i9 versions
I've been following that story. If I understand correctly, the chip can push too much voltage to the cores under certain circumstances causing failure if it happens for extended period. There's a microcode patch out now and also motherboard bios updates. I've read that Arrow Lake has the voltage locked at 1.2V but I'm not clear if that was earlier silicon or the final chips. I checked the voltages on my 13th gen and they looked fine before any patching, even running all cores with CPDN tasks, so it might be a more complicated use case.
ID: 71403 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 71404 - Posted: 7 Sep 2024, 11:53:22 UTC - in response to Message 71403.  

I noticed the article specially, because I've just attached host 1552676 - i5-14400. The system builder has built it round a Gigabyte motherboard, and I see they have a BIOS update dated 7 August - when the machine's quiet I'll perhaps look to see if they've applied that - and maybe flash it before loading it to the max.
ID: 71404 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71405 - Posted: 7 Sep 2024, 12:40:42 UTC - in response to Message 71401.  
Last modified: 7 Sep 2024, 12:48:59 UTC

Definitely improve the cooling if you hit 90C without all the cores running.
It is maybe a single spike hitting 90C every five minutes or so. according to Psensor if I go above 12 cores running. 75% of the time it is below 80C
ID: 71405 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71406 - Posted: 7 Sep 2024, 12:54:28 UTC - in response to Message 71405.  
Last modified: 7 Sep 2024, 12:59:50 UTC

It is maybe a single spike hitting 90C every five minutes or so. according to Psensor if I go above 12 cores running. 75% of the time it is below 80C
You probably know that AMD chips will throttle usually above 90C (assuming it's a desktop). Varies slightly from chip to chip. Might still be worth investigating additional cooling. Ought to be able to run all cores without throttling.
ID: 71406 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71407 - Posted: 7 Sep 2024, 14:43:08 UTC - in response to Message 71406.  

Might still be worth investigating additional cooling.


I will first look at adding an additional exhaust fan. Probably should not have gone for a midi case!
ID: 71407 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71408 - Posted: 7 Sep 2024, 14:58:11 UTC - in response to Message 71407.  

Might still be worth investigating additional cooling.
I will first look at adding an additional exhaust fan. Probably should not have gone for a midi case!
What cooling have you got on the CPU? Might not be the case itself.
ID: 71408 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71409 - Posted: 9 Sep 2024, 5:31:09 UTC

Curiously, what are these machines that take 1000+ hours to complete a task?
ID: 71409 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71410 - Posted: 9 Sep 2024, 6:44:59 UTC - in response to Message 71409.  

Curiously, what are these machines that take 1000+ hours to complete a task?


A couple of machines ago, I had a core 2 duo. I suspect that would have taken a while. An N2830 Celeron I found was taking over 600hours a task.

Longest time I found was on computer 6,515,846 A core 2 duo with 1809 hours for a task but it was completing the majority of tasks in a fifth of that time. When I ran tasks on an old netbook they would take months to complete tasks that finished on my core2 duo in a couple of weeks.
ID: 71410 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71473 - Posted: 18 Sep 2024, 14:27:07 UTC
Last modified: 18 Sep 2024, 14:28:29 UTC

Further to heat issues, if I restrict activity by percentage of CPUs BOINC is allowed to use, the temperature goes up a lot higher than if I have the same number of tasks running but I am suspending tasks to stop them running. That is using WINE. I will when current tasks finished, see if the same happens with a VM or with native Linux tasks.

I am not coming up with any logic behind what I am seeing.
Edit: I shall over the next few days do some more investigation using top and other tools to see if I can find any clues.
ID: 71473 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71474 - Posted: 18 Sep 2024, 14:47:38 UTC - in response to Message 71473.  
Last modified: 18 Sep 2024, 14:48:00 UTC

Is the CPU is throttling on the higher load?
ID: 71474 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71479 - Posted: 18 Sep 2024, 15:26:40 UTC - in response to Message 71474.  

Not that I can see. But restricting by suspending tasks rather than setting number of CPUs in use leads to between 5 and 10C lower temps the bigger difference with more cores in use. It will be interesting to see if the same applies with native LInux or with BOINC in a VM. But that will be a week away as I want to get the tasks I have finished first.
ID: 71479 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71509 - Posted: 20 Sep 2024, 9:22:37 UTC
Last modified: 20 Sep 2024, 10:25:12 UTC

Been delving a little more into the temperatures on my Ryzen9 and what the figures in Psensors mean. I found this Tctl, the highest temperature seen seems to be the temperature in the core of the chip. Edge, sensor1, sensor2 and composite temperatures have never been above 50C. The discussion I read on the link seems to suggest that Tctl going above 90C is nothing to worry about.
ID: 71509 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71510 - Posted: 20 Sep 2024, 10:31:10 UTC - in response to Message 71509.  

I believe 7950X max temp. is 95C which I think means that it's sustainable for long stretches. Modern CPUs are highly unlikely to overheat and break, they'll throttle. You can probably run your CPU between 90-95C just fine for long stretches of time. One thing to look into is undervolting. You'll likely find that you can run the CPU at a given speed a good amount cooler.

I have a 5900X which has a 90C max temp. I've had it run at or slightly higher than 90C on really hot days, as per CoreTemp. I might reduce the load a bit if it looks like it's going to be really hot for days in a row but otherwise I don't worry about temps.
ID: 71510 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71511 - Posted: 20 Sep 2024, 10:43:05 UTC - in response to Message 71510.  

With 14 cores running, my 7950X has a couple of spikes every five minutes above 90C. Mostly it is below 85C.
Till I started doing some reading about what the different temperatures indicated I had decided to cut down a bit. I have gone back up to 45% of CPUs now.
ID: 71511 · Report as offensive     Reply Quote
ChelseaOilman

Send message
Joined: 24 Dec 19
Posts: 32
Credit: 40,970,218
RAC: 78,855
Message 71518 - Posted: 20 Sep 2024, 14:51:08 UTC - in response to Message 71511.  

With 14 cores running, my 7950X has a couple of spikes every five minutes above 90C. Mostly it is below 85C.
Till I started doing some reading about what the different temperatures indicated I had decided to cut down a bit. I have gone back up to 45% of CPUs now.

If your running Windows you can install Ryzen Master and set to ECO Mode. It will run a lot cooler and use less electricity.

https://www.amd.com/en/products/software/ryzen-master.html

You can acheive the same results in Linux but you'll have to change things in the BIOS.
ID: 71518 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71520 - Posted: 20 Sep 2024, 15:33:32 UTC - in response to Message 71518.  

I am using Linux and WINE to run BOINC. I will have a play with the bios once the current batch of work is finished. I am assuming RyzenMaster won't work in a VM.
ID: 71520 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : processors, memory, performance and heat.

©2024 cpdn.org