Re: [PATCH v2] kernel: Implement selective syscall userspace redirection

From: Gabriel Krisman Bertazi
Date: Thu Jul 09 2020 - 14:36:59 EST


Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:

> On Thu, Jul 09, 2020 at 12:38:40AM -0400, Gabriel Krisman Bertazi wrote:
>> The proposed interface looks like this:
>>
>> prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <dispatcher>, [selector])
>>
>> Dispatcher is the address of a syscall instruction that is allowed to
>> by-pass the blockage, such that in fast paths you don't need to disable
>> the trap nor check the selector. This is essential to return from
>> SIGSYS to a blocked area without triggering another SIGSYS from the
>> rt_sigreturn.
>
> Should <dispatcher> be a single pointer or should the interface specify
> a range from which syscalls may be made without being redirected? eg,
> one could specify the whole of libc.
>
> prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start>, <inclusive-end>,
> [selector])

I liked this suggestion a lot, since user can just pass a single address
to get the original interface, but it still let us not pay the cost of
__get_user on more paths. I will add it to v3.

>
>> +++ b/include/linux/syscall_user_dispatch.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _SYSCALL_USER_DISPATCH_H
>> +#define _SYSCALL_USER_DISPATCH_H
>> +
>> +struct task_struct;
>> +static void clear_tsk_thread_flag(struct task_struct *tsk, int flag);
>> +
>> +#ifdef CONFIG_SYSCALL_USER_DISPATCH
>> +struct syscall_user_dispatch {
>> + int __user *selector;
>> + unsigned long __user dispatcher;
>
> The __user annotation is on the pointer, not the value. ie, it's
>
> unsigned long foo;
> unsigned long __user *p;
>
> get_user(foo, p)
>
>> +++ b/include/uapi/asm-generic/siginfo.h
>> @@ -285,6 +285,7 @@ typedef struct siginfo {
>> */
>> #define SYS_SECCOMP 1 /* seccomp triggered */
>> #define NSIGSYS 1
>> +#define SYS_USER_REDIRECT 2
>
> I'd suggest that SYS_USER_REDIRECT should be moved up by one line.
>
>> +int set_syscall_user_dispatch(int mode, unsigned long __user dispatcher,
>> + int __user *selector)
>> +{
>> + switch (mode) {
>> + case PR_SYSCALL_DISPATCH_DISABLE:
>> + if (dispatcher || selector)
>> + return -EINVAL;
>> + break;
>> + case PR_SYSCALL_DISPATCH_ENABLE:
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + if (selector) {
>> + if (!access_ok(selector, sizeof(int)))
>> + return -EFAULT;
>> + }
>
> You're not enforcing the alignment requirement here.
>
>> + spin_lock_irq(&current->sighand->siglock);
>> +
>> + current->syscall_dispatch.selector = selector;
>> + current->syscall_dispatch.dispatcher = dispatcher;
>> +
>> + /* make sure fastlock is committed before setting the flag. */
>
> fastlock? ;-)

Gee, keeping variable renames updated on comments is hard, compiler
won't catch those. :)

> I don't think you actually need this. You're setting per-thread state on
> yourself, so what's the race that you're concerned about?

Good point. I was assuming this would be modifiable from under it, but
it is not the case.

>
>> + smp_mb__before_atomic();
>> +
>> + if (mode == PR_SYSCALL_DISPATCH_ENABLE)
>> + set_tsk_thread_flag(current, TIF_SYSCALL_USER_DISPATCH);
>> + else
>> + clear_tsk_thread_flag(current, TIF_SYSCALL_USER_DISPATCH);
>> +
>> + spin_unlock_irq(&current->sighand->siglock);
>> +
>> + return 0;
>> +}
>> --
>> 2.27.0
>>

--
Gabriel Krisman Bertazi