Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

From: hpa
Date: Tue Dec 26 2017 - 21:36:55 EST

On December 26, 2017 6:16:37 PM PST, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>On Tue, Dec 26, 2017 at 3:19 PM, Alexandru Chirvasitu
><achirvasub@xxxxxxxxx> wrote:
>> I went back to the initial problematic commit e802a51 and modified it
>as you suggest:
>Thank you.
>> This did not work out for me, but now it fails differently. Both
>> (kexec -l + kexec -e) and (kexec -p + echo c > /proc/sysrq-trigger)
>> end in call traces and freezes.
>> It does seem to be tied to idt_invalidate. One of the last things I
>> see on the screen (which is ends up frozen with the computer
>> is
>> EIP: idt_invalidate+0x6/0x40 SS:ESP: 0068:f6c47cd0
>Yes, interesting, it's the stack canary load access there:
> mov %gs:0x14,%edx
>that traps.
>And that actually makes a lot of sense: the load_segments() call just
>above has rloaded all segments with __KERNEL_DS.
>So while the stack canary access *intends* to load it from the magic
>stack canary segment (offset 0x14), we've just reset all segments to
>the standard zero-based full-sized ones, and obviously that will take
>a page fault at 0x14.
>And the reason you now actually *see* the page fault is that we
>haven't completely buggered the CPU state now, so the trap handler
>actually works. With the GDT reset before, it used to take that same
>trap, but now the trap handler itself would fault, and cause a triple
>fault - which resets the machine.
>So it wasn't actually tracing, it was the stack canary all along. So
>at least it's truly root-caused now.
>But the fix is the same: we just can't afford to do any function calls.
>Alternatively, we should just fix that insane "load_segments()". I'm
>not sure why the code insists on reloading the segments in the first
>So you could try just to remove the "load_segments()" line entirely.
>Thanks for spending the time testing things out,
> Linus

This is why I personally prefer to see these kinds of terminal stubs written in assembly explicitly: the C compiler simply doesn't have all the information needed to do the right thing.

I'm personally very sceptical to nuking the GDT unless we're in real mode. There seems to be no point, and just opens up failure modes.
Sent from my Android device with K-9 Mail. Please excuse my brevity.