Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

From: Ingo Molnar
Date: Sun Dec 24 2017 - 06:29:02 EST



* Alexandru Chirvasitu <achirvasub@xxxxxxxxx> wrote:

> Thank you for the swift reply!
>
> On Sat, Dec 23, 2017 at 07:30:21PM -0800, Linus Torvalds wrote:
> > On Sat, Dec 23, 2017 at 5:44 PM, Alexandru Chirvasitu
> > <achirvasub@xxxxxxxxx> wrote:
> > >
> > > For testing purposes, I've altered machine_kexec_32.c making the
> > > following toy commit. It naively undoes part of e802a51, solely to
> > > confirm that's where it goes awry in my setup.
> >
> > That's really funky.
> >
> > The idt_invalidate() seems to do *exactly* the same thing. It uses
> > "load_idt()" on an IDT with size 0, and the supplied address.
> >
> > Can you disassemble your "set_idt()" code vs the "idt_invalidate()"?
> >
>
> I seem to have done some such thing just now, but please excuse some
> poking around in the dark here (I've disassembled code exactly once
> before: yesterday, in answering a similar request for more info in
> another lkml thread..).
>
> I'm actually not even certain the sequel is what you are asking.
>
> I couldn't find the set_idt symbol in
> arch/x86/kernel/machine_kexec_32.o. Google seemed to think this has
> something to do with the 'static' marker for that function, so I made
> another commit differing from the previous one only in that it removes
> that marker (i.e. set_idt is now 'void' rather than 'static void').
>
> I can now see that function with objdump. The relevant sections of
> objdump -D on the two files are:
>
> --- arch/x86/kernel/machine_kexec_32.o ---
>
> 00000180 <set_idt>:
> 180: e8 fc ff ff ff call 181 <set_idt+0x1>
> 185: 55 push %ebp
> 186: 89 e5 mov %esp,%ebp
> 188: 83 ec 0c sub $0xc,%esp
> 18b: 89 45 f8 mov %eax,-0x8(%ebp)
> 18e: 66 89 55 f6 mov %dx,-0xa(%ebp)
> 192: 8d 45 f6 lea -0xa(%ebp),%eax
> 195: 65 8b 0d 14 00 00 00 mov %gs:0x14,%ecx
> 19c: 89 4d fc mov %ecx,-0x4(%ebp)
> 19f: 31 c9 xor %ecx,%ecx
> 1a1: ff 15 20 00 00 00 call *0x20
> 1a7: 8b 45 fc mov -0x4(%ebp),%eax
> 1aa: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
> 1b1: 75 02 jne 1b5 <set_idt+0x35>
> 1b3: c9 leave
> 1b4: c3 ret
> 1b5: e8 fc ff ff ff call 1b6 <set_idt+0x36>
> 1ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
>
> ----------------------------------------------
>
> and
>
> --- arch/x86/kernel/idt.o ---
>
> 00000000 <idt_invalidate>:
> 0: e8 fc ff ff ff call 1 <idt_invalidate+0x1>
> 5: 55 push %ebp
> 6: 89 e5 mov %esp,%ebp
> 8: 83 ec 0c sub $0xc,%esp
> b: 65 8b 15 14 00 00 00 mov %gs:0x14,%edx
> 12: 89 55 fc mov %edx,-0x4(%ebp)
> 15: 31 d2 xor %edx,%edx
> 17: 31 d2 xor %edx,%edx
> 19: 89 45 f8 mov %eax,-0x8(%ebp)
> 1c: 8d 45 f6 lea -0xa(%ebp),%eax
> 1f: 66 89 55 f6 mov %dx,-0xa(%ebp)
> 23: ff 15 20 00 00 00 call *0x20
> 29: 8b 45 fc mov -0x4(%ebp),%eax
> 2c: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
> 33: 75 02 jne 37 <idt_invalidate+0x37>
> 35: c9 leave
> 36: c3 ret
> 37: e8 fc ff ff ff call 38 <idt_invalidate+0x38>
>
> -------------------------------
>
> I've also checked again that this newer compilation still gives a
> well-behaved kexec.

So guessing from the disassembly your kernel config seems to have both function
tracing and paravirt enabled - both can cause complications here, in particular
paravirt may override load_idt().

( In the end execution should run through the same paravirt ops with and without
your patch, so I don't see how it can make a difference but it's easier to look
at the disassembly without the paravirt indirection - and your patch obviously
makes a difference so we are missing some detail. )

So to simplify analysis, could you disable both please - i.e. disable these if
your .config has any of these set:

CONFIG_HYPERVISOR_GUEST=y
CONFIG_FUNCTION_TRACER=y
CONFIG_STACK_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y

The easiest way to disable these is to run this script over your .config, in the
kernel source code directory where you are building your kernel:

grep -vE 'CONFIG_HYPERVISOR_GUEST|CONFIG_FUNCTION_TRACER|CONFIG_STACK_TRACER|CONFIG_FUNCTION_GRAPH_TRACER' .config > .config2
/bin/mv .config2 .config

... then run 'make oldconfig' and answer 'N' to every question. Double check the
resulting .config via:

grep -E 'CONFIG_HYPERVISOR_GUEST|CONFIG_FUNCTION_TRACER|CONFIG_STACK_TRACER|CONFIG_FUNCTION_GRAPH_TRACER' .config

which should output:

# CONFIG_HYPERVISOR_GUEST is not set
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_STACK_TRACER is not set

Also, could you send us your .config?

Thanks,

Ingo