[PATCH][v2] intel_idle: Adjust the SKX C6 latency and residency if PC6 is disabled

From: Chen Yu
Date: Thu May 27 2021 - 23:16:08 EST


Currently cpuidle assumes worst-case C-state parameters, and so C6
is described with PC6 parameters, which is worst case for requesting
CC6. When PC6 is enabled, this is appropriate. But if PC6 is disabled
in BIOS, the exit latency and target_residency should be adjusted
accordingly.

Exit latency:
Previously the C6 exit latency was measured when woken up from CC6/PC6.
With PC6 disabled, the C6 exit latency should be CC6/PC0.

Target residency:
With PC6 disabled, idle duration within [CC6, PC6) would make the
idle governor choose C1E over C6. This would cause low energy-efficiency.
We should lower the bar to request C6 when PC6 is disabled.

To fill this gap, check if PC6 is disabled in the BIOS in the
MSR_PKG_CST_CONFIG_CONTROL(0xe2). If so, use CC6/PC0 parameters as the
new exit latency. Meanwhile, update target_residency to 3 times of the new
exit latency. This is consistent with how intel_idle driver uses _CST to
calculate the target_residency. The consequence is that, the OS would
be more offen to choose C6 over C1E when PC6 is disabled. This is reasonable
because if the user is using C6, it implies that the user cares about energy,
so choosing C6 more frequently is in accordance with user requirement.

The new exit latency of CC6/PC0 92us was from wult[1] result on SKX, which was
measured via NIC wakeup from 99.99th latency. Besides SKX, the CLX and CPX
both have the same CPU model number. And since they have similar CC6 exit latency
to SKX, 96us and 89us respectively, reuse the value of SKX.

There is concern that if we should introduce a more generic solution
rather than optimizing on each platforms. However consider the
code complexity and different PC6 bit interpretation on different
platforms, tune the code per platform seems to be an acceptable trade-off.

[1] https://intel.github.io/wult/

Suggested-by: Len Brown <len.brown@xxxxxxxxx>
Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
---
v2: Simplify the commit log to not mention C3/PC3. (Artem)
Confirm the exit latency on CLX and CPX.(Artem)
---
drivers/idle/intel_idle.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ec1b9d306ba6..e6c543b5ee1d 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -1484,6 +1484,36 @@ static void __init sklh_idle_state_table_update(void)
skl_cstates[6].flags |= CPUIDLE_FLAG_UNUSABLE; /* C9-SKL */
}

+/**
+ * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake
+ * idle states table.
+ */
+static void __init skx_idle_state_table_update(void)
+{
+ unsigned long long msr;
+
+ rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
+
+ /*
+ * 000b: C0/C1 (no package C-state support)
+ * 001b: C2
+ * 010b: C6 (non-retention)
+ * 011b: C6 (retention)
+ * 111b: No Package C state limits.
+ */
+ if ((msr & 0x7) < 2) {
+ /*
+ * Uses the CC6 + PC0 latency and 3 times of
+ * latency for target_residency if the PC6
+ * is disabled in BIOS. This is consistent
+ * with how intel_idle driver uses _CST
+ * to set the target_residency.
+ */
+ skx_cstates[2].exit_latency = 92;
+ skx_cstates[2].target_residency = 276;
+ }
+}
+
static bool __init intel_idle_verify_cstate(unsigned int mwait_hint)
{
unsigned int mwait_cstate = MWAIT_HINT2CSTATE(mwait_hint) + 1;
@@ -1515,6 +1545,9 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
case INTEL_FAM6_SKYLAKE:
sklh_idle_state_table_update();
break;
+ case INTEL_FAM6_SKYLAKE_X:
+ skx_idle_state_table_update();
+ break;
}

for (cstate = 0; cstate < CPUIDLE_STATE_MAX; ++cstate) {
--
2.25.1