Re: [PATCH v8 4/7] entry: Support Syscall User Dispatch on common syscall entry
From: Andy Lutomirski
Date: Tue Dec 01 2020 - 19:05:37 EST
On Fri, Nov 27, 2020 at 11:33 AM Gabriel Krisman Bertazi
<krisman@xxxxxxxxxxxxx> wrote:
>
> Syscall User Dispatch (SUD) must take precedence over seccomp and
> ptrace, since the use case is emulation (it can be invoked with a
> different ABI) such that seccomp filtering by syscall number doesn't
> make sense in the first place. In addition, either the syscall is
> dispatched back to userspace, in which case there is no resource for to
> trace, or the syscall will be executed, and seccomp/ptrace will execute
> next.
>
> Since SUD runs before tracepoints, it needs to be a SYSCALL_WORK_EXIT as
> well, just to prevent a trace exit event when dispatch was triggered.
> For that, the on_syscall_dispatch() examines context to skip the
> tracepoint, audit and other work.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx>
> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
> Changes since v6:
> - Update do_syscall_intercept signature (Christian Brauner)
> - Move it to before tracepoints
> - Use SYSCALL_WORK flags
> ---
> include/linux/entry-common.h | 2 ++
> kernel/entry/common.c | 17 +++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 49b26b216e4e..a6e98b4ba8e9 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -44,10 +44,12 @@
> SYSCALL_WORK_SYSCALL_TRACE | \
> SYSCALL_WORK_SYSCALL_EMU | \
> SYSCALL_WORK_SYSCALL_AUDIT | \
> + SYSCALL_WORK_SYSCALL_USER_DISPATCH | \
> ARCH_SYSCALL_WORK_ENTER)
> #define SYSCALL_WORK_EXIT (SYSCALL_WORK_SYSCALL_TRACEPOINT | \
> SYSCALL_WORK_SYSCALL_TRACE | \
> SYSCALL_WORK_SYSCALL_AUDIT | \
> + SYSCALL_WORK_SYSCALL_USER_DISPATCH | \
> ARCH_SYSCALL_WORK_EXIT)
>
> /*
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index f1b12dc32ff4..ec20aba3b890 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -6,6 +6,8 @@
> #include <linux/livepatch.h>
> #include <linux/audit.h>
>
> +#include "common.h"
> +
> #define CREATE_TRACE_POINTS
> #include <trace/events/syscalls.h>
>
> @@ -47,6 +49,16 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> {
> long ret = 0;
>
> + /*
> + * Handle Syscall User Dispatch. This must comes first, since
> + * the ABI here can be something that doesn't make sense for
> + * other syscall_work features.
> + */
> + if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> + if (do_syscall_user_dispatch(regs))
> + return -1L;
> + }
> +
> /* Handle ptrace */
> if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
> ret = arch_syscall_enter_tracehook(regs);
> @@ -232,6 +244,11 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> {
> bool step;
>
> + if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
> + if (on_syscall_dispatch())
> + return;
> + }
I think this would be less confusing if you just open-coded the body
of on_syscall_dispatch here and got rid of the helper.
--Andy