Re: [PATCH v3 3/5] x86/enter: Use IBRS on syscall and interrupts
From: Paolo Bonzini
Date: Wed Jan 10 2018 - 06:27:25 EST
On 10/01/2018 11:04, Peter Zijlstra wrote:
> On Tue, Jan 09, 2018 at 06:26:47PM -0800, Tim Chen wrote:
>> Set IBRS upon kernel entrance via syscall and interrupts. Clear it
>> upon exit. IBRS protects against unsafe indirect branching predictions
>> in the kernel.
>>
>> The NMI interrupt save/restore of IBRS state was based on Andrea
>> Arcangeli's implementation.
>> Here's an explanation by Dave Hansen on why we save IBRS state for NMI.
>>
>> The normal interrupt code uses the 'error_entry' path which uses the
>> Code Segment (CS) of the instruction that was interrupted to tell
>> whether it interrupted the kernel or userspace and thus has to switch
>> IBRS, or leave it alone.
>>
>> The NMI code is different. It uses 'paranoid_entry' because it can
>> interrupt the kernel while it is running with a userspace IBRS (and %GS
>> and CR3) value, but has a kernel CS. If we used the same approach as
>> the normal interrupt code, we might do the following;
>>
>> SYSENTER_entry
>> <-------------- NMI HERE
>> IBRS=1
>> do_something()
>> IBRS=0
>> SYSRET
>>
>> The NMI code might notice that we are running in the kernel and decide
>> that it is OK to skip the IBRS=1. This would leave it running
>> unprotected with IBRS=0, which is bad.
>>
>> However, if we unconditionally set IBRS=1, in the NMI, we might get the
>> following case:
>>
>> SYSENTER_entry
>> IBRS=1
>> do_something()
>> IBRS=0
>> <-------------- NMI HERE (set IBRS=1)
>> SYSRET
>>
>> and we would return to userspace with IBRS=1. Userspace would run
>> slowly until we entered and exited the kernel again.
>>
>> Instead of those two approaches, we chose a third one where we simply
>> save the IBRS value in a scratch register (%r13) and then restore that
>> value, verbatim.
>>
>
> What this Changelog fails to address is _WHY_ we need this. What does
> this provide that retpoline does not.
Which, for the record, is just that it works better on Skylake+ CPUs.
Paolo