Re: [PATCH v3 2/3] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()

From: Peter Zijlstra
Date: Tue Nov 12 2024 - 08:50:42 EST


On Tue, Nov 12, 2024 at 01:30:29PM +0100, Rafael J. Wysocki wrote:

> > > Then we are back to the original approach though:
> > >
> > > https://lore.kernel.org/linux-pm/20241029101507.7188-3-patryk.wlazlyn@xxxxxxxxxxxxxxx/
> >
> > Well, that won't be brilliant for hybrid systems where the available
> > states are different per CPU.
>
> But they aren't.
>
> At least so far that has not been the case on any platform known to me
> and I'm not aware of any plans to make that happen (guess what, some
> other OSes may be unhappy).

Well, that's something at least.

> > Also, all of this is a bit of a trainwreck... AFAICT AMD wants IO based
> > idle (per the 2018 commit). So they want the ACPI thing.
>
> Yes.
>
> > But on Intel we really don't want HLT, and had that MWAIT, but that has
> > real problems with KEXEC. And I don't think we can rely on INTEL_IDLE=y.
>
> We could because it handles ACPI now and ACPI idle doesn't add any
> value on top of it except for the IO-based idle case.

You're saying we can mandate INTEL_IDLE=y? Because currently defconfig
doesn't even have it on.

> > The ACPI thing doesn't support FFh states for it's enter_dead(), should it?
>
> It does AFAICS, but the FFH is still MWAIT.

What I'm trying to say is that acpi_idle_play_dead() doesn't seem to
support FFh and as such won't ever use MWAIT.

> > Anyway, ideally x86 would grow a new instruction to offline a CPU, both
> > MWAIT and HLT have problems vs non-maskable interrupts.
> >
> > I really don't know what is best here, maybe moving that whole CPUID
> > loop to boot, store the value in a per-cpu mwait_play_dead_hint. Have
> > AMD explicitly clear the value, and avoid mwait when 0 -- hint 0 is
> > equal to HLT anyway.
> >
> > But as said, we need a new instruction.
>
> Before that, there is the problem with the MWAIT hint computation in
> mwait_play_dead() and in fact intel_idle does know what hint to use in
> there.

But we need to deal witn INTEL_IDLE=n. Also, I don't see any MWAIT_LEAF
parsing in intel_idle.c. Yes, it requests the information, but then it
mostly ignores it -- it only consumes two ECX bits or so.

I don't see it finding a max-cstate from mwait_substates anywhere.

So given we don't have any such code, why can't we simply fix the cstate
parsing we have in mwait_play_dead() and call it a day?