Re: [PATCH] x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
From: Eric W. Biederman
Date: Thu Dec 28 2017 - 19:31:35 EST
hpa@xxxxxxxxx writes:
> On December 28, 2017 2:47:47 PM PST, ebiederm@xxxxxxxxxxxx wrote:
>>Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>>
>>> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> Date: Wed, 27 Dec 2017 11:41:30 -0800
>>> Subject: [PATCH] x86-32: fix kexec with stack canary
>>(CONFIG_CC_STACKPROTECTOR)
>>>
>>> Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned
>>up
>>> and unified the IDT invalidation that existed in a couple of places.
>>It
>>> changed no actual real code.
>>>
>>> Despite not changing any actual real code, it _did_ change code
>>> generation: by implementing the common idt_invalidate() function in
>>> archx86/kernel/idt.c, it made the use of the function in
>>> arch/x86/kernel/machine_kexec_32.c be a real function call rather
>>than
>>> an (accidental) inlining of the function.
>>>
>>> That, in turn, exposed two issues:
>>>
>>> - in load_segments(), we had incorrectly reset all the segment
>>> registers, which then made the stack canary load (which gcc does
>>> using offset of %gs) cause a trap. Instead of %gs pointing to the
>>> stack canary, it will be the normal zero-based kernel segment, and
>>> the stack canary load will take a page fault at address 0x14.
>>>
>>> - to make this even harder to debug, we had invalidated the GDT just
>>> before calling idt_invalidate(), which meant that the fault
>>happened
>>> with an invalid GDT, which in turn causes a triple fault and
>>> immediate reboot.
>>>
>>> Fix this by
>>>
>>> (a) not reloading the special segments in load_segments(). We
>>currently
>>> don't do any percpu accesses (which would require %fs on x86-32)
>>in
>>> this area, but there's no reason to think that we might not want
>>to
>>> do them, and like %gs, it's pointless to break it.
>>>
>>> (b) doing idt_invalidate() before invalidating the GDT, to keep
>>things
>>> at least _slightly_ more debuggable for a bit longer. Without a
>>> IDT, traps will not work. Without a GDT, traps also will not
>>work,
>>> but neither will any segment loads etc. So in a very real sense,
>>> the GDT is even more core than the IDT.
>>>
>>> Reported-and-tested-by: Alexandru Chirvasitu <achirvasub@xxxxxxxxx>
>>> Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation")
>>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>>> Cc: Peter Anvin <hpa@xxxxxxxxx>
>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> ---
>>>
>>> I wrote "Reported-and-tested-by: Alexandru" because while this isn't
>>> exactly the same patch as anything Alexandru tested, it's pretty
>>close,
>>> and I'm pretty sure this version will fix his issues too.
>>>
>>> I decided to try to just do the minimal changes: the GDT invalidation
>>last
>>> (because of the debugging) and _only_ removing the resetting of fs/gs
>>
>>> rather than removing load_segments() entirely.
>>>
>>> I think making idt_invalidate() be inline would be a good thing as
>>well,
>>> and I do think that all those "phys_to_virt(0)" things are garbage,
>>but I
>>> also think they are independent issues, so I didn't touch any of
>>that.
>>>
>>> I'm assuming I'll get this patch back through the x86 tree, and will
>>not
>>> be applying it to my own git tree unless the x86 people ask me to.
>>>
>>> Comments?
>>
>>There is one significant problem with this patch. It changes the ABI
>>that kexec provides to the next kernel.
>>
>>That ABI is that the segments will be set to a well defined value.
>>That value is flat 32bit segments with a base address of 0.
>>
>>By removing %fs and %gs from load_segments they are now effectively
>>random undefined values, to the next kernel.
>>
>>I don't know if anything actually cares. But if they do they are now
>>broken. It is easy enough to preserve that invariant I don't see
>>a point in risking potential breaking and looking to see if we have
>>actually broken the ABI.
>>
>>It feels like this is something we should move into assembly rather
>>than attempting to cater to the changing evironment of C code in the
>>kernel. Or if not we need a big fat comment be very very careful
>>this code is special.
>>
>>Eric
>>
>>
>>> arch/x86/kernel/machine_kexec_32.c | 4 +---
>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/machine_kexec_32.c
>>b/arch/x86/kernel/machine_kexec_32.c
>>> index 00bc751c861c..edfede768688 100644
>>> --- a/arch/x86/kernel/machine_kexec_32.c
>>> +++ b/arch/x86/kernel/machine_kexec_32.c
>>> @@ -48,8 +48,6 @@ static void load_segments(void)
>>> "\tmovl $"STR(__KERNEL_DS)",%%eax\n"
>>> "\tmovl %%eax,%%ds\n"
>>> "\tmovl %%eax,%%es\n"
>>> - "\tmovl %%eax,%%fs\n"
>>> - "\tmovl %%eax,%%gs\n"
>>> "\tmovl %%eax,%%ss\n"
>>> : : : "eax", "memory");
>>> #undef STR
>>> @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image)
>>> * The gdt & idt are now invalid.
>>> * If you want to load them you must set up your own idt & gdt.
>>> */
>>> - set_gdt(phys_to_virt(0), 0);
>>> idt_invalidate(phys_to_virt(0));
>>> + set_gdt(phys_to_virt(0), 0);
>>>
>>> /* now call it */
>>> image->start = relocate_kernel_ptr((unsigned long)image->head,
>
> The ABI the kernel requires on entry is also documented, and we should
> stick to that.
Wrong interface this, does not transfer directly to a linux kernel. This
transfers to a shim that starts a linux kernel or something else.
It is way past time to be having design discussions about what this
interface should do. It is more than a decade old.
> That being said, the bottom line is to just stop putting these kinds
> of final handovers into C and just hope the compiler (or
> tracing/debugging developers) doesn't randomly break at some thing.
In general I agree. That makes the code a little less approachable, but
it would seem to remove the chance of surprise interactions even more.
Eric