ia32_sysenter_target does not preserve EFLAGS

From: Denys Vlasenko
Date: Fri Mar 27 2015 - 10:26:12 EST


Hi,

While running some tests I noticed that EFLAGS
is not saved across syscalls if I use 32-bit
userspace, use SYSENTER, and paravirt is active.

Looking at the code, it's actually clear why that happens.

/*
* SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
* IF and VM in rflags are cleared (IOW: interrupts are off).
* SYSENTER does not save anything on the stack,
* and does not save old rip (!!!) and rflags.
*/
ENTRY(ia32_sysenter_target)
SWAPGS_UNSAFE_STACK <============================
movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp
ENABLE_INTERRUPTS(CLBR_NONE)

movl %ebp, %ebp
movl %eax, %eax
movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d

/* Construct struct pt_regs on stack */
pushq_cfi $__USER32_DS /* pt_regs->ss */
pushq_cfi %rbp /* pt_regs->sp */
CFI_REL_OFFSET rsp,0
pushfq_cfi /* pt_regs->flags */

The SWAPGS_UNSAFE_STACK, it's it involves paravirt callbacks,
will change EFLAGS, and it *can't* save/restore them -
there is no place to save it, since neither stack nor
PER_CPU() is usable at that point.

Interestingly, *no one ever complained*!

Apparently, users *don't* depend on arithmetic flags
to survive over syscall. They also okay with DF flag
being cleared.

Let's go flag-by-flag.

ID - probably no one depends on it
VIP,VIF,VM - v86 stuff, not supported in 64bit
AC - someone probably do use this
RF - should be cleared to 0
NT - iret via task gate, not supported in 64bit
IOPL - usually 00, sys_iopl() can change it
DF - according to C ABI, should be 0
IF - should be preserved (but almost always 1)
TF - should be preserved
arith flags - probably no one cares

IOW. Bits to be preseved are only AC, IOPL, TF, and _maybe_
IF.

AC and IOPL are preserved even with this paravirt quirk
because paravirt hooks do not mangle them.

TF preservation and proper restoration is handled by
do_debug + syscall_trace_enter_phase2 + iret
combo.

We unconditionally set IF. This is only a problem for applications
which use sys_iopl(3) and, disable IRQs in userspace and perform
syscalls. The set of such apps is probably empty.
(This "bug" exists even for non-paravirt case).

So, formally, we have a bug: we do not preserve IF,
DF and arith flags.

I'm proposing to use this opportunity to amend syscall ABI
to say that arith flags are not preserved across syscalls,
and DF can be cleared to 0 by syscalls (but can't be set to 1).
Evidently, it's broken for some time for some virtualized
setups and users are okay.

I'm not sure what to do with the "bug" of forcing IF=1.
Fix it? Or also declare that syscalls can set IF=1?
Do you think this is a legitimate userspace code?

sys_iopl(3);
cli;
syscall();
/* expects irqs still disabled */

--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/