Re: [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint
From: Patryk Wlazlyn
Date: Tue Nov 26 2024 - 06:37:07 EST
>> The MWAIT instruction needs different hints on different CPUs to reach
>> the most specific idle states. The current hint calculation* in
>> mwait_play_dead() code works in practice on current hardware, but it
>> fails on a recent one, Intel's Sierra Forest and possibly some future ones.
>> Those newer CPUs' power efficiency suffers when the CPU is put offline.
>>
>> * The current calculation is the for loop inspecting edx in
>> mwait_play_dead()
>>
>> The current implementation for looking up the mwait hint for the deepest
>> cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
>> calculates the mwait hint based on the number of reported substates.
>> This approach depends on the hints associated with them to be continuous
>> in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
>> is not met on the recent Intel platforms.
>>
>> For example, Intel's Sierra Forest report two cstates with two substates
>> each in cpuid leaf 5:
>>
>> Name* target cstate target subcstate (mwait hint)
>> ===========================================================
>> C1 0x00 0x00
>> C1E 0x00 0x01
>>
>> -- 0x10 ----
>>
>> C6S 0x20 0x22
>> C6P 0x20 0x23
>>
>> -- 0x30 ----
>>
>> /* No more (sub)states all the way down to the end. */
>> ===========================================================
>>
>> * Names of the cstates are not included in the CPUID leaf 0x5, they are
>> taken from the product specific documentation.
>>
>> Notice that hints 0x20 and 0x21 are skipped entirely for the target
>> cstate 0x20 (C6), being a cause of the problem for the current cpuid
>> leaf 0x5 algorithm.
>>
>> Allow cpuidle code to provide mwait play dead loop with known, mwait
>> hint for the deepest idle state on a given platform, skipping the cpuid
>> based calculation.
>>
>> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@xxxxxxxxxxxxxxx>
>
> I'm going to risk saying that the changelog doesn't match the code
> changes in the patch any more.
>
> The code changes are actually relatively straightforward: The bottom
> half of mwait_play_dead() is split off, so it can be called from
> multiple places.
>
> The other places from which to call it are cpuidle drivers
> implementing :enter_dead() callbacks that may want to use MWAIT as the
> idle state entry method. The ACPI processor_idle driver and
> intel_idle will be updated by subsequent patches to do so.
>
> The reason for it is mostly consistency: If the cpuidle driver uses a
> specific idle state for things like suspend-to-idle, it is better to
> let it decide what idle state to use for "play_dead" because it may
> know better.
>
> Another reason is what mwait_play_dead() does to determine the MWAIT
> argument (referred to as a "hint"), but IMO it belongs in a changelog
> of a later patch because this one doesn't actually do anything about
> it. In fact, it is not expected to change the behavior of the code.
The commit message here is to justify the change. I thought that giving
some context is important. Do you suggest moving this under a
different commit or don't mention the SRF and C6 states at all?