Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry

From: Andy Lutomirski
Date: Fri Apr 05 2019 - 09:50:34 EST




> On Apr 5, 2019, at 2:35 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
>> On Mon, 25 Mar 2019, Thomas Gleixner wrote:
>>> On Fri, 15 Mar 2019, Chang S. Bae wrote:
>>> ENTRY(paranoid_exit)
>>> UNWIND_HINT_REGS
>>> DISABLE_INTERRUPTS(CLBR_ANY)
>>> TRACE_IRQS_OFF_DEBUG
>>> + ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase", "nop",\
>>> + X86_FEATURE_FSGSBASE
>>> + wrgsbase %rbx
>>> + jmp .Lparanoid_exit_no_swapgs;
>>
>> Again. A few newlines would make it more readable.
>>
>> This modifies the semantics of paranoid_entry and paranoid_exit. Looking at
>> the usage sites there is the following code in the nmi maze:
>>
>> /*
>> * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
>> * as we should not be calling schedule in NMI context.
>> * Even with normal interrupts enabled. An NMI should not be
>> * setting NEED_RESCHED or anything that normal interrupts and
>> * exceptions might do.
>> */
>> call paranoid_entry
>> UNWIND_HINT_REGS
>>
>> /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
>> movq %rsp, %rdi
>> movq $-1, %rsi
>> call do_nmi
>>
>> /* Always restore stashed CR3 value (see paranoid_entry) */
>> RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
>>
>> testl %ebx, %ebx /* swapgs needed? */
>> jnz nmi_restore
>> nmi_swapgs:
>> SWAPGS_UNSAFE_STACK
>> nmi_restore:
>> POP_REGS
>>
>> I might be missing something, but how is that supposed to work when
>> paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then
>> there is a big fat comment missing explaining why.
>
> So this _is_ broken.
>
> On entry:
>
> rbx = rdgsbase()
> wrgsbase(KERNEL_GS)
>
> On exit:
>
> if (ebx == 0)
> swapgs
>
> The resulting matrix:
>
> | ENTRY GS | RBX | EXIT | GS on IRET | RESULT
> | | | | |
> 1 | KERNEL_GS | KERNEL_GS | EBX == 0 | USER_GS | FAIL
> | | | | |
> 2 | KERNEL_GS | KERNEL_GS | EBX != 0 | KERNEL_GS | ok
> | | | | |
> 3 | USER_GS | USER_GS | EBX == 0 | USER_GS | ok
> | | | | |
> 4 | USER_GS | USER_GS | EBX != 0 | KERNEL_GS | FAIL
>
>
> #1 Just works by chance because it's unlikely that the lower 32bits of a
> per CPU kernel GS are all 0.
>
> But it's just a question of probability that this turns into a
> non-debuggable once per year crash (think KASLR).
>
> #4 This can happen when the NMI hits the kernel in some other entry code
> _BEFORE_ or _AFTER_ swapgs.
>
> User space using GS addressing with GS[31:0] != 0 will crash and burn.
>
>

Hi all-

In a previous incarnation of these patches, I complained about the use of SWAPGS in the paranoid path. Now Iâm putting my maintainer foot down. On a non-FSGSBASE system, the paranoid path known, definitively, which GS is where, so SWAPGS is annoying. With FSGSBASE, unless you start looking at the RIP that you interrupted, you cannot know whether you have user or kernel GSBASE live, since they can have literally the same value. One of the numerous versions of this patch compared the values and just said âwell, itâs harmless to SWAPGS if user code happens to use the same value as the kernelâ. I complained that it was far too fragile.

So Iâm putting my foot down. If you all want my ack, youâre going to save the old GS, load the new one with WRGSBASE, and, on return, youâre going to restore the old one with WRGSBASE. You will not use SWAPGS in the paranoid path.

Obviously, for the non-paranoid path, it all keeps working exactly like it does now.

Furthermore, if you folks even want me to review this series, the ptrace tests need to be in place. On inspection of the current code (after the debacle a few releases back), it appears the SETREGSETâs effect depends on the current values in the registers â it does not actually seem to reliably load the whole state. So my confidence will be greatly increased if your series first adds a test that detects that bug (and fails!), then fixes the bug in a tiny little patch, then adds FSGSBASE, and keeps the test working.

âAndy