Re: [RFC PATCH v4 4/8] x86/smp: Allow calling mwait_play_dead with arbitrary hint

From: Rafael J. Wysocki
Date: Tue Nov 26 2024 - 06:46:19 EST


On Tue, Nov 26, 2024 at 12:37 PM Patryk Wlazlyn
<patryk.wlazlyn@xxxxxxxxxxxxxxx> wrote:
>
> >> The MWAIT instruction needs different hints on different CPUs to reach
> >> the most specific idle states. The current hint calculation* in
> >> mwait_play_dead() code works in practice on current hardware, but it
> >> fails on a recent one, Intel's Sierra Forest and possibly some future ones.
> >> Those newer CPUs' power efficiency suffers when the CPU is put offline.
> >>
> >> * The current calculation is the for loop inspecting edx in
> >> mwait_play_dead()
> >>
> >> The current implementation for looking up the mwait hint for the deepest
> >> cstate, in mwait_play_dead() code works by inspecting CPUID leaf 0x5 and
> >> calculates the mwait hint based on the number of reported substates.
> >> This approach depends on the hints associated with them to be continuous
> >> in the range [0, NUM_SUBSTATES-1]. This continuity is not documented and
> >> is not met on the recent Intel platforms.
> >>
> >> For example, Intel's Sierra Forest report two cstates with two substates
> >> each in cpuid leaf 5:
> >>
> >> Name* target cstate target subcstate (mwait hint)
> >> ===========================================================
> >> C1 0x00 0x00
> >> C1E 0x00 0x01
> >>
> >> -- 0x10 ----
> >>
> >> C6S 0x20 0x22
> >> C6P 0x20 0x23
> >>
> >> -- 0x30 ----
> >>
> >> /* No more (sub)states all the way down to the end. */
> >> ===========================================================
> >>
> >> * Names of the cstates are not included in the CPUID leaf 0x5, they are
> >> taken from the product specific documentation.
> >>
> >> Notice that hints 0x20 and 0x21 are skipped entirely for the target
> >> cstate 0x20 (C6), being a cause of the problem for the current cpuid
> >> leaf 0x5 algorithm.
> >>
> >> Allow cpuidle code to provide mwait play dead loop with known, mwait
> >> hint for the deepest idle state on a given platform, skipping the cpuid
> >> based calculation.
> >>
> >> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@xxxxxxxxxxxxxxx>
> >
> > I'm going to risk saying that the changelog doesn't match the code
> > changes in the patch any more.
> >
> > The code changes are actually relatively straightforward: The bottom
> > half of mwait_play_dead() is split off, so it can be called from
> > multiple places.
> >
> > The other places from which to call it are cpuidle drivers
> > implementing :enter_dead() callbacks that may want to use MWAIT as the
> > idle state entry method. The ACPI processor_idle driver and
> > intel_idle will be updated by subsequent patches to do so.
> >
> > The reason for it is mostly consistency: If the cpuidle driver uses a
> > specific idle state for things like suspend-to-idle, it is better to
> > let it decide what idle state to use for "play_dead" because it may
> > know better.
> >
> > Another reason is what mwait_play_dead() does to determine the MWAIT
> > argument (referred to as a "hint"), but IMO it belongs in a changelog
> > of a later patch because this one doesn't actually do anything about
> > it. In fact, it is not expected to change the behavior of the code.
>
> The commit message here is to justify the change. I thought that giving
> some context is important. Do you suggest moving this under a
> different commit or don't mention the SRF and C6 states at all?

I would move it to the patch that actually makes a difference for SRF.
This one doesn't do that.