Re: [PATCH 0/3] idle, Honor Hardware Disabled States

From: Len Brown
Date: Thu Mar 31 2016 - 00:59:39 EST

> Len,
> Your patch does
> + skl_cstates[5].disabled = 1; /* C8-SKL */
> + skl_cstates[6].disabled = 1; /* C9-SKL */
> and I don't think that is correct for SKY-H.

it is correct.

> Your patch does not take into account that the states are explicitly disabled
> in MSR_NHM_SNB_PKG_CST_CFG_CTL. That is the problem here and what you've done
> is simply hammered a disable into those states.

Are we talking about the failure in
or a different problem?

> Additionally, your patch does not show the user the correct state information:
> [root@dhcp40-125 ~]# egrep ^ /sys/devices/system/cpu/cpu0/cpuidle/state?/disable
> /sys/devices/system/cpu/cpu0/cpuidle/state0/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state1/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state2/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state3/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state4/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state5/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state6/disable:1:0
> /sys/devices/system/cpu/cpu0/cpuidle/state7/disable:1:0 << should be 1
> /sys/devices/system/cpu/cpu0/cpuidle/state8/disable:1:0 << should be 1

the 'disabled' attribute you see in sysfs is not
struct cpuidle_state.disabled
it is
struct cpuidle_state_usage.disabled

> The fix is to honour the settings in MSR_NHM_SNB_PKG_CST_CFG_CTL. I cannot say
> for certain that ALL SKY-H are impacted (you are admittedly in better position
> to say so or not). I can say that on the 2 systems tested here the
> MSR_NHM_SNB_PKG_CST_CFG_CTL do have the appropriate disable value set.
> /me could be missing some important info -- again, perhaps there are some
> SKY-H's out there that do not have states disabled in
> MSR_NHM_SNB_PKG_CST_CFG_CTL, and that's why I've proposed rebasing on top of
> your change.

Do you see this debug message when you run current upstream on this hardware?

/* if state marked as disabled, skip it */
if (cpuidle_state_table[cstate].disabled != 0) {
pr_debug(PREFIX "state %s is disabled",

If no, then my patch is not disabling C8/C9 on your system.

Also, if it were, the code above causes the states to not appear
at all in sysfs, because they are not registered.


if PC10 is disabled there, then functionally, it doesn't matter what we do,
which is why my patch does nothing when PC10 is disabled.

In such a scenario, pc10 presence in sysfs (and cpufreq)
is cosmetic. The hardware knows what to do.

Do you think that cosmetic issue is worth dealing with?
Note that the decoding of that MSR changes with every CPU,
so to get it right (like turbostat does), we'd need a table.
Also, it would be useful only for states which are PC states only.
ie. we can't disable CC7 just because PC7 is disabled. etc.
So you could remove PC8, PC9, PC10 from sysfs on SKL
when they are disabled, but that is all.

Len Brown, Intel Open Source Technology Center