Re: [PATCH v2 1/3] syscall_user_dispatch: Allow allowed range wrap-around

From: Dmitry Vyukov
Date: Sat Mar 08 2025 - 05:00:47 EST


On Mon, 24 Feb 2025 at 09:45, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> There are two possible scenarios for syscall filtering:
> - having a trusted/allowed range of PCs, and intercepting everything else
> - or the opposite: a single untrusted/intercepted range and allowing
> everything else
> The current implementation only allows the former use case due to
> allowed range wrap-around check. Allow the latter use case as well
> by removing the wrap-around check.
> The latter use case is relevant for any kind of sandboxing scenario,
> or monitoring behavior of a single library. If a program wants to
> intercept syscalls for PC range [START, END) then it needs to call:
> prctl(..., END, -(END-START), ...);
> which sets a wrap-around range that excludes everything
> besides [START, END).
>
> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Cc: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Gregory Price <gregory.price@xxxxxxxxxxxx>
> Cc: Marco Elver <elver@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx

Any remaining concerns with this series?

Are syscall_user_dispatch patches pulled via x86 tree?

> ---
> kernel/entry/syscall_user_dispatch.c | 9 +++------
> kernel/sys.c | 6 ++++++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
> index 5340c5aa89e7d..a0659f0515404 100644
> --- a/kernel/entry/syscall_user_dispatch.c
> +++ b/kernel/entry/syscall_user_dispatch.c
> @@ -37,6 +37,7 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> char state;
>
> + /* Note: this check form allows for range wrap-around. */
> if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> return false;
>
> @@ -80,13 +81,9 @@ static int task_set_syscall_user_dispatch(struct task_struct *task, unsigned lon
> break;
> case PR_SYS_DISPATCH_ON:
> /*
> - * Validate the direct dispatcher region just for basic
> - * sanity against overflow and a 0-sized dispatcher
> - * region. If the user is able to submit a syscall from
> - * an address, that address is obviously valid.
> + * Note: we don't check and allow arbitrary values for
> + * offset/len in particular to allow range wrap-around.
> */
> - if (offset && offset + len <= offset)
> - return -EINVAL;
>
> /*
> * access_ok() will clear memory tags for tagged addresses
> diff --git a/kernel/sys.c b/kernel/sys.c
> index cb366ff8703af..666322026ad72 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2735,6 +2735,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
> break;
> case PR_SET_SYSCALL_USER_DISPATCH:
> + /*
> + * Sign-extend len for 32-bit processes to allow region
> + * wrap-around.
> + */
> + if (in_compat_syscall())
> + arg4 = (long)(s32)arg4;
> error = set_syscall_user_dispatch(arg2, arg3, arg4,
> (char __user *) arg5);
> break;
> --
> 2.48.1.601.g30ceb7b040-goog
>