Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace

From: Andy Lutomirski
Date: Wed Oct 01 2014 - 10:47:24 EST


On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@xxxxxxxxx> wrote:
> On Wed, 1 Oct 2014 09:09:13 -0500
> Chuck Ebbert <cebbert.lkml@xxxxxxxxx> wrote:
>
>> On Tue, 30 Sep 2014 21:51:27 -0700
>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> > The NT flag doesn't do anything in long mode other than causing IRET
>> > to #GP. Oddly, CPL3 code can still set NT using popf.
>> >
>> > Entry via hardware or software interrupt clears NT automatically, so
>> > the only relevant entries are fast syscalls.
>> >
>> > If user code causes kernel code to run with NT set, then there's at
>> > least some (small) chance that it could cause trouble. For example,
>> > user code could cause a call to EFI code with NT set, and who knows
>> > what would happen? Apparently some games on Wine sometimes do
>> > this (!), and, if an IRET return happens, they will segfault. That
>> > segfault cannot be handled, because signal delivery fails, too.
>> >
>> > This patch programs the CPU to clear NT on entry via SYSCALL (both
>> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
>> > in software on entry via SYSENTER.
>> >
>> > To save a few cycles, this borrows a trick from Jan Beulich in Xen:
>> > it checks whether NT is set before trying to clear it. As a result,
>> > it seems to have very little effect on SYSENTER performance on my
>> > machine.
>> >
>> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
>> >
>> > I haven't touched anything on 32-bit kernels.
>> >
>> > The syscall mask change comes from a variant of this patch by Anish
>> > Bhatt.
>> >
>> > Cc: stable@xxxxxxxxxxxxxxx
>> > Reported-by: Anish Bhatt <anish@xxxxxxxxxxx>
>> > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>> > ---
>> > arch/x86/ia32/ia32entry.S | 12 ++++++++++++
>> > arch/x86/kernel/cpu/common.c | 2 +-
>> > 2 files changed, 13 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> > index 4299eb05023c..44d1dd371454 100644
>> > --- a/arch/x86/ia32/ia32entry.S
>> > +++ b/arch/x86/ia32/ia32entry.S
>> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>> > 1: movl (%rbp),%ebp
>> > _ASM_EXTABLE(1b,ia32_badarg)
>> > ASM_CLAC
>> > +
>> > + /*
>> > + * Sysenter doesn't filter flags, so we need to clear NT
>> > + * ourselves. To save a few cycles, we can check whether
>> > + * NT was set instead of doing an unconditional popfq.
>> > + */
>> > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */
>> > + jz 1f
>> > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> > + popfq_cfi
>> > +1:
>> > +
>>
>> I think you've gone backwards with this version. The earlier one got
>> some of the performance loss back by not needing to do the "cld" insn.
>>
>> You should just replace that "cld" (line 146) with
>>
>> pushfq_cfi $2
>> popfq_cfi
>>
>> Unfortunately I'm not set up to test that yet. But I did look at
>> the SDM and can't see a need to preserve any of the flags.
>>
>
>
> <sigh> that's:
>
> pushfw_cfi $0x202
>
> IF needs to stay on because we've already enabled interrupts after
> sysenter.

I tried exactly this. It was much slower than the version I sent.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/