Re: [RFC] de-asmify the x86-64 system call slowpath

From: Al Viro
Date: Mon Jan 27 2014 - 06:37:36 EST


On Mon, Jan 27, 2014 at 11:27:59AM +0100, Peter Zijlstra wrote:

> Obviously I don't particularly like the SAVE_REST/FIXUP_TOP_OF_STACK
> being added to the reschedule path.
>
> Can't we do as Al suggested earlier and have 2 slowpath calls, one
> without PT_REGS and one with?
>
> That said, yes its a nice cleanup, entry.S always hurts my brain.

BTW, there's an additional pile of obfuscation:
/* work to do on interrupt/exception return */
#define _TIF_WORK_MASK \
(0x0000FFFF & \
~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT| \
_TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))

/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK \
((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \
_TIF_NOHZ)

These guys are
_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | _TIF_MCE_NOTIFY |
_TIF_USER_RETURN_NOTIFY | _TIF_UPROBE | _TIF_NEED_RESCHED | 0xe200
and
_TIF_SYSCALL_TRACE | _TIF_NOTIFY_RESUME | _TIF_SIGPENDING |
_TIF_NEED_RESCHED | _TIF_SINGLESTEP | _TIF_SYSCALL_EMU |
_TIF_SYSCALL_AUDIT | _TIF_MCE_NOTIFY | _TIF_SYSCALL_TRACEPOINT |
_TIF_NOHZ | _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE | 0xe200
resp., or
_TIF_DO_NOTIFY_MASK | _TIF_UPROBE | _TIF_NEED_RESCHED | 0xe200
and
_TIF_DO_NOTIFY_MASK | _TIF_WORK_SYSCALL_EXIT | _TIF_NEED_RESCHED |
_TIF_SYSCALL_EMU | _TIF_UPROBE | 0xe200

0xe200 (aka bits 15,14,13,9) consists of the bits that are never set by
anybody, so short of really deep magic it can be discarded. The rest
is also interesting, to put it politely. Why is _TIF_UPROBE *not* a part
of _TIF_DO_NOTIFY_MASK, for example? Note that do_notify_resume() checks
and clears it, but on syscall (and interrupt) exit paths we only call it
with something in _TIF_DO_NOTIFY_MASK. If UPROBE is set, but nothing
else in that set is, we'll be looping forever, right? There's pending
work (according to _TIF_WORK_MASK), so we won't just leave. And we won't
be calling do_notify_resume(), so there's nothing to clear that bit.
Only it gets even nastier - on the paranoid_userspace path we call
do_notify_resume() if anything in _TIF_WORK_MASK besides NEED_RESCHED
happens to be set. So _there_ getting solitary UPROBE is legitimate.

_TIF_SYSCALL_EMU is also an interesting story - on the way out it
* forces us on iret path
* does *not* trigger trace_syscall_leave() on its own
(trace_syscall_leave() is aware of that sucker, though, with rather
confusing comment)
* hits do_notify_resume() (for no good reason - do_notify_resume()
silently ignores it)
* gets cleared from the workmask (i.e. %edi), so on the next
iteration through the loop it gets completely ignored.

AFAICS, all of that is pointless, since SYSCALL_EMU wants to avoid
SYSRET only if we had entered with it and in that case we would've
gone through tracesys and stayed the fsck away from SYSRET path
anyway (similar on 32bit - if we hit syscall_trace_enter(), we
do not rejoin the sysenter path). IOW, no reason for it to be
in _TIF_ALLWORK_MASK...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/