Re: Regression in 4.8 - CPU speed set very low

From: Larry Finger
Date: Thu Sep 29 2016 - 11:09:38 EST


On 09/29/2016 07:19 AM, Rafael J. Wysocki wrote:
On Wednesday, September 28, 2016 09:22:59 PM Larry Finger wrote:
On 09/27/2016 06:46 AM, Rafael J. Wysocki wrote:
On Tue, Sep 27, 2016 at 10:48 AM, Larry Finger
<Larry.Finger@xxxxxxxxxxxx> wrote:
On 09/26/2016 10:12 PM, Doug Smythies wrote:

On 2016.09.26 18:31 Srinivas Pandruvada wrote:

On Mon, 2016-09-26 at 19:48 -0500, Larry Finger wrote:

On 09/26/2016 07:21 PM, Rafael J. Wysocki wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Larry Finger wrote:
But for both we need a reproducer anyway.

I do not have a reliable reproducer. The condition has always
happened when
running a high-compute job such as a 'make -j8' on the kernel, or
building the
RPM for openSUSE's implementation of VirtualBox. The latter is what
I'm using
for most of my testing.


Run some CPU stressor and get all your CPU's going at 100% load.
And watch your core temperatures while you do so.


for i in 1 2 3 4; do while : ; do : ; done & done

triggered the fault in a few minutes.



It also would be good to rule out the thermal throttling (as per
the Srinivas' comments).


It is almost certainly thermal throttling, or similar causing
Clock modulation, of it seems 50%.


While the infinite loops were running, the temps were:

finger@linux-1t8h:~/rtlwifi_new> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +83.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 0: +83.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 1: +74.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)

It looks like the trip point (high) temperature was exceeded causing
thermal throttling to kick in.

After the fault occurs, I get

finger@linux-1t8h:~/rtlwifi_new> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +44.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 0: +43.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 1: +41.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)

So after that it stays at 400 MHz forever, right?


For now, please tell me what's in
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq

800000

Your effective freq is lower than 800MHz. One of the possible reason is
thermal throttling.

What distro you are using?


And what make and model of LapTop?


Toshiba Tecra A50-A with CPU Model: 6.60.3 "Intel(R) Core(TM) i7-4600M CPU @
2.90GHz. That is a dual-core unit with hyperthreading.

@Rafael: As I write this, the system has been running the infinite loop test
for almost 5 hours with kernel 4.7. I will leave that running while I'm
gone, but I am certain that it is OK.

OK, and what temperatures do you see while doing this?

finger@linux-1t8h:~/linux-2.6> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +90.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 0: +90.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)
Core 1: +78.0ÂC (high = +84.0ÂC, crit = +100.0ÂC)

Once again, the CPU temp is greater than the "high" value; however, the clock
rate continues to hold near 3600 MHz.

My laptop was inadvertently put to sleep while I was gone. I forgot to leave a
note for my wife and she quieted the noisy cpu fan. :)

It looks like in 4.8-rc we made a change that caused the "high" trip point to
be acted on.

Srinivas, Rui, do you recall what that can be?

One more question (I think I asked it previously): In the failing case
(4.8-rc1 and later), when the frequency drops down to the 400 MHz, does it
ever go back higher or is it stuck at that level forever?

In any case, it may help to file a bug at bugzilla.kernel.org against
CPU/thermal or similar and let me know the bug number. We'll need to
collect some tracepoint data to debug this and some place to put them
into for easy reference.

Sorry if I missed that earlier question. The CPU is stuck at that lower frequency until I reboot.

Bug report at https://bugzilla.kernel.org/show_bug.cgi?id=173361. I tried to cover the main points of the discussion. Please add the ones that I missed.

Larry