Re: [PATCH v2] kernel: Implement selective syscall userspace redirection

From: Matthew Wilcox
Date: Thu Jul 09 2020 - 07:49:18 EST


On Thu, Jul 09, 2020 at 12:38:40AM -0400, Gabriel Krisman Bertazi wrote:
> The proposed interface looks like this:
>
> prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <dispatcher>, [selector])
>
> Dispatcher is the address of a syscall instruction that is allowed to
> by-pass the blockage, such that in fast paths you don't need to disable
> the trap nor check the selector. This is essential to return from
> SIGSYS to a blocked area without triggering another SIGSYS from the
> rt_sigreturn.

Should <dispatcher> be a single pointer or should the interface specify
a range from which syscalls may be made without being redirected? eg,
one could specify the whole of libc.

prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start>, <inclusive-end>, [selector])

> +++ b/include/linux/syscall_user_dispatch.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _SYSCALL_USER_DISPATCH_H
> +#define _SYSCALL_USER_DISPATCH_H
> +
> +struct task_struct;
> +static void clear_tsk_thread_flag(struct task_struct *tsk, int flag);
> +
> +#ifdef CONFIG_SYSCALL_USER_DISPATCH
> +struct syscall_user_dispatch {
> + int __user *selector;
> + unsigned long __user dispatcher;

The __user annotation is on the pointer, not the value. ie, it's

unsigned long foo;
unsigned long __user *p;

get_user(foo, p)

> +++ b/include/uapi/asm-generic/siginfo.h
> @@ -285,6 +285,7 @@ typedef struct siginfo {
> */
> #define SYS_SECCOMP 1 /* seccomp triggered */
> #define NSIGSYS 1
> +#define SYS_USER_REDIRECT 2

I'd suggest that SYS_USER_REDIRECT should be moved up by one line.

> +int set_syscall_user_dispatch(int mode, unsigned long __user dispatcher,
> + int __user *selector)
> +{
> + switch (mode) {
> + case PR_SYSCALL_DISPATCH_DISABLE:
> + if (dispatcher || selector)
> + return -EINVAL;
> + break;
> + case PR_SYSCALL_DISPATCH_ENABLE:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + if (selector) {
> + if (!access_ok(selector, sizeof(int)))
> + return -EFAULT;
> + }

You're not enforcing the alignment requirement here.

> + spin_lock_irq(&current->sighand->siglock);
> +
> + current->syscall_dispatch.selector = selector;
> + current->syscall_dispatch.dispatcher = dispatcher;
> +
> + /* make sure fastlock is committed before setting the flag. */

fastlock? ;-)
I don't think you actually need this. You're setting per-thread state on
yourself, so what's the race that you're concerned about?

> + smp_mb__before_atomic();
> +
> + if (mode == PR_SYSCALL_DISPATCH_ENABLE)
> + set_tsk_thread_flag(current, TIF_SYSCALL_USER_DISPATCH);
> + else
> + clear_tsk_thread_flag(current, TIF_SYSCALL_USER_DISPATCH);
> +
> + spin_unlock_irq(&current->sighand->siglock);
> +
> + return 0;
> +}
> --
> 2.27.0
>