intel_pstate throttling stuck at low frequency

From: Ben Gamari
Date: Sun Aug 09 2015 - 12:31:08 EST



Hello all,

I have a Dell Latitude E7440 running Ubuntu 15.04 which seems to be
suffering from the intel_pstate driver getting stuck in a throttled
state while under load. The issue typically occurs on warm days when the
while the machine is under load for an extended period of time (e.g.
while compiling).

Under these conditions performance gradually deteriorates as the CPU
frequency creeps lower and lower. In this dmesg log [1] from a recent
incident, we see that there were a couple core and package throttling
events. This in itself isn't problematic; what is troubling is that
despite the fact that the temperature quickly returned to normal, the
CPU frequency remained at just below 400 MHz for the next hour or so
while I gathered data on the issue with the system under load. The
temperature was a a stable low-60 degrees Celcius for this duration.
After I finished gathering data I killed the CPU-intensive process and
it took over ten minutes for frequency scaling to behave normally again,
eventually scaling up to 3.3 GHz when necessary.. I experience these
sorts of events fairly regularly when placing the machine under load.

It seems to make no difference whether I use the powersave or
performance governor. This is strange as most accounts I have seen claim
that the performance governor unconditionally sets the CPU frequency at
its maximum frequency. Even if there were a thermal limit the system
temperature in this case isn't terribly unreasonable (60 to 65 degrees
Celcius).

I've attached some further information gathered during the
incident, which occurred with a 4.2-rc5 kernel, although I have been
experiencing issues of this nature ever since I bought the machine
(mostly in the summer).

How would one further trace down this issue? The kernel tree seems to
be rather lacking in documentation describing what factors enter
intel_pstate's scaling decisions. Is there any way to get better
visibility into this process?

Any ideas on what might be going wrong here?

Cheers,

- Ben


[1] https://gist.github.com/bgamari/ae032532a13fa52a8a69


$ cpupower monitor
|Nehalem || SandyBridge || HaswellExtended || Mperf || Idle_Stats
CPU | C3 | C6 | PC3 | PC6 || C7 | PC2 | PC7 || PC8 | PC9 | PC10 || C0 | Cx | Freq || POLL | C1-H | C1E- | C3-H | C6-H | C7s- | C8-H | C9-H | C10-
0| 7.04| 5.22| 0.00| 0.00|| 31.01| 18.16| 0.00|| 0.00| 0.00| 0.00|| 40.21| 59.79| 388|| 0.00| 0.04| 0.57| 5.08| 3.23| 13.00| 8.80| 29.25| 0.00
2| 7.04| 5.22| 0.00| 0.00|| 31.01| 18.16| 0.00|| 0.00| 0.00| 0.00|| 27.59| 72.41| 379|| 0.00| 0.01| 0.20| 7.76| 5.16| 17.02| 19.51| 21.57| 1.15
1| 3.59| 2.92| 0.00| 0.00|| 41.40| 18.16| 0.00|| 0.00| 0.00| 0.00|| 32.14| 67.86| 394|| 0.00| 0.01| 0.26| 5.21| 4.30| 24.69| 6.45| 24.22| 2.83
3| 3.59| 2.92| 0.00| 0.00|| 41.40| 18.16| 0.00|| 0.00| 0.00| 0.00|| 26.58| 73.42| 367|| 0.00| 0.00| 0.11| 1.87| 1.14| 30.36| 5.54| 32.62| 1.95

$ cpupower frequency-info
analyzing CPU 0:
driver: intel_pstate
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 0.97 ms.
hardware limits: 800 MHz - 3.30 GHz
available cpufreq governors: performance, powersave
current policy: frequency should be within 800 MHz and 3.30 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 380 MHz (asserted by call to hardware).
boost state support:
Supported: yes
Active: yes

$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +25.0C (crit = +107.0C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +63.0C (high = +100.0C, crit = +100.0C)
Core 0: +62.0C (high = +100.0C, crit = +100.0C)
Core 1: +63.0C (high = +100.0C, crit = +100.0C)

dell_smm-virtual-0
Adapter: Virtual device
Processor Fan: 6710 RPM
CPU: +62.0C
Ambient: +49.0C
SODIMM: +52.0C

$ cd /sys/devices/system/cpu/intel_pstate
$ cat {max,min}_perf_pct
100
100
$ cat no_turbo num_pstates turbo_pct
0
26
24

$ cd /sys/kernel/debug/pstate_snb
$ cat pgain_pct
20
$ cat igain_pct
0
$ cat dgain_pct
0
$ cd ../pkg_temp_thermal
$ cat pkg_thres_*
0
0
$ cd ../intel_powerclamp
$ cat powerclamp_calib
controlling cpu: 0
pct confidence steady dynamic (compensation)
0 0 0 0
1 0 0 0
2 0 0 0
... (remaining lines also all zeros)



$ sudo turbostat
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 175 47.37 369 2694
0 210 55.23 380 2698
2 219 61.70 354 2693
1 139 36.09 385 2692
3 131 36.45 360 2694
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 167 45.63 365 2696
0 108 28.06 385 2695
2 314 89.24 352 2698
1 130 33.43 388 2698
3 115 31.75 364 2694
CPU Avg_MHz %Busy Bzy_MHz TSC_MHz
- 174 46.53 373 2694
0 176 45.48 386 2696
2 200 55.75 360 2694
1 179 46.42 385 2694
3 139 38.47 362 2694

$ cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu

Analyzing CPU 0:
Number of idle states: 9
Available idle states: POLL C1-HSW C1E-HSW C3-HSW C6-HSW C7s-HSW C8-HSW C9-HSW C10-HSW
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 19629
Duration: 4903415
C1-HSW:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 12066075
Duration: 2316078427
C1E-HSW:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 1437624
Duration: 497058866
C3-HSW:
Flags/Description: MWAIT 0x10
Latency: 33
Usage: 1664168
Duration: 916288273
C6-HSW:
Flags/Description: MWAIT 0x20
Latency: 133
Usage: 456853
Duration: 353643717
C7s-HSW:
Flags/Description: MWAIT 0x32
Latency: 166
Usage: 1714991
Duration: 1671456695
C8-HSW:
Flags/Description: MWAIT 0x40
Latency: 300
Usage: 1435877
Duration: 1966505031
C9-HSW:
Flags/Description: MWAIT 0x50
Latency: 600
Usage: 1565954
Duration: 3739218646
C10-HSW:
Flags/Description: MWAIT 0x60
Latency: 2600
Usage: 118301
Duration: 955949684

Attachment: signature.asc
Description: PGP signature