Re: [PATCH 2/4] x86/entry: Use asm_noist_exc_nmi() for NMI in early booting stage

From: Thomas Gleixner
Date: Mon May 03 2021 - 17:45:53 EST


On Mon, May 03 2021 at 22:24, Thomas Gleixner wrote:

> On Mon, May 03 2021 at 22:13, Thomas Gleixner wrote:
>
>> On Tue, Apr 27 2021 at 07:09, Lai Jiangshan wrote:
>>> + *
>>> + * While the other entries for the exceptions which use Interrupt stacks can
>>> + * be also used on the kernel stack, asm_exc_nmi() can not be used on the
>>> + * kernel stack for it relies on the RSP-located "NMI executing" variable
>>> + * which expects to on a fixed location in the NMI IST stack. For early
>>> + * booting stage, asm_noist_exc_nmi() is used for NMI.
>>> */
>>> static const __initconst struct idt_data def_idts[] = {
>>> INTG(X86_TRAP_DE, asm_exc_divide_error),
>>> - INTG(X86_TRAP_NMI, asm_exc_nmi),
>>> + INTG(X86_TRAP_NMI, asm_noist_exc_nmi),
>>
>> Actually this is a x86_64 only problem. The 32bit variant is fine, but
>> for consistency there is no problem to have that extra entry point on
>> 32bit as well.
>
> Bah, no. This patch breaks 32bit because on 32bit nothing sets the entry
> to asm_exc_nmi() later on.

Sigh. Finding a fixes tag for this is complicated.

The problem was introduced in 4.14 with b70543a0b2b6 ("x86/idt: Move
regular trap init to tables").

Before that trap_init() installed an IST gate right away, but looking
deeper this was broken forever because there is a hen and egg problem.

ISTs only work after TSS is initialized and the ordering here is:

trap_init()
init_idt()
cpu_init()
init_tss()

So the original code had a race window between init_idt() and
init_tss(). Any IST using exception in that window goes south because
TSS is not initialized.

b70543a0b2b6 traded the above with that NMI issue. All other
exceptions are fine...

I'll think about it tomorrow some more...