Re: [PATCH 3/7] x86/enter: Use IBRS on syscall and interrupts

From: Dave Hansen
Date: Thu Jan 04 2018 - 15:45:22 EST


On 01/04/2018 09:56 AM, Tim Chen wrote:
> If NMI runs when exiting kernel between IBRS_DISABLE and
> SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have
> left enabled when exiting the NMI. IBRS bit 0 would then be left
> enabled in userland until the next enter kernel.
>
> That is a minor inefficiency only, but we can eliminate it by saving
> the MSR when entering the NMI in save_paranoid and restoring it when
> exiting the NMI.

Can I suggest and alternate description for the NMI case? This is
long-winded, but it should keep me from having to think through it yet
again. :)

"
The normal interrupt code uses the 'error_entry' path which uses the
Code Segment (CS) of the instruction that was interrupted to tell
whether it interrupted the kernel or userspace and thus has to switch
IBRS, or leave it alone.

The NMI code is different. It uses 'paranoid_entry' because it can
interrupt the kernel while it is running with a userspace IBRS (and %GS
and CR3) value, but has a kernel CS. If we used the same approach as
the normal interrupt code, we might do the following;

SYSENTER_entry
<-------------- NMI HERE
IBRS=1
do_something()
IBRS=0
SYSRET

The NMI code might notice that we are running in the kernel and decide
that it is OK to skip the IBRS=1. This would leave it running
unprotected with IBRS=0, which is bad.

However, if we unconditionally set IBRS=1, in the NMI, we might get the
following case:

SYSENTER_entry
IBRS=1
do_something()
IBRS=0
<-------------- NMI HERE (set IBRS=1)
SYSRET

and we would return to userspace with IBRS=1. Userspace would run
slowly until we entered and exited the kernel again. (This is the case
Tim is alluding to in the patch description).

Instead of those two approaches, we chose a third one where we simply
save the IBRS value in a scratch register (%r13) and then restore that
value, verbatim. This is what PTI does with CR3 and it works beautifully.
"