Re: [PATCH v3 2/3] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()
From: Peter Zijlstra
Date: Tue Nov 12 2024 - 10:50:27 EST
On Tue, Nov 12, 2024 at 02:44:49PM +0200, Artem Bityutskiy wrote:
> On Tue, 2024-11-12 at 13:18 +0100, Peter Zijlstra wrote:
> > But on Intel we really don't want HLT, and had that MWAIT, but that has
> > real problems with KEXEC. And I don't think we can rely on INTEL_IDLE=y.
>
> If INTEL_IDLE is not set, then we'll just use existing mwait creation algorithm
> in 'mwait_play_dead()', which works too, just not ideal.
So why not fix the substate detectoring function and ignore everything?
> > Anyway, ideally x86 would grow a new instruction to offline a CPU, both
> > MWAIT and HLT have problems vs non-maskable interrupts.
> ... snip ...
> > But as said, we need a new instruction.
>
> FYI, I already started discussing a special "gimme the deepest C-state" mwait
> hint - just a constant like 0xFF. CPUID leaf 5 has many reserved bits, one could
> be used for enumeration of this feature.
>
> But this is just a quick idea so far, and informal discussions so far.
No, not mwait hint. We need an instruction that:
- goes to deepest C state
- drops into WAIT-for-Start-IPI (SIPI)
Notably, it should not wake from:
- random memory writes
- NMI, MCE, SMI and other such non-maskable thingies
- anything else -- the memory pointed to by RIP might no longer exist
Lets call the instruction: DEAD.
With the mwait 'hack', kexec still goes belly up if it gets a spurious
NMI (and them others) at an inopportune time, and this does happen
afaik. Just not enough to worry the data center guys like the mwait
thing.