Re: [PATCH v3 0/2] seccomp: pass uretprobe system call through seccomp
From: Jiri Olsa
Date: Fri Feb 07 2025 - 19:04:11 EST
On Fri, Feb 07, 2025 at 04:27:09PM +0100, Jann Horn wrote:
> On Sun, Feb 2, 2025 at 5:29 PM Eyal Birger <eyal.birger@xxxxxxxxx> wrote:
> > uretprobe(2) is an performance enhancement system call added to improve
> > uretprobes on x86_64.
> >
> > Confinement environments such as Docker are not aware of this new system
> > call and kill confined processes when uretprobes are attached to them.
>
> FYI, you might have similar issues with Syscall User Dispatch
> (https://docs.kernel.org/admin-guide/syscall-user-dispatch.html) and
> potentially also with ptrace-based sandboxes, depending on what kinda
> processes you inject uprobes into. For Syscall User Dispatch, there is
> already precedent for a bypass based on instruction pointer (see
> syscall_user_dispatch()).
>
> > Since uretprobe is a "kernel implementation detail" system call which is
> > not used by userspace application code directly, pass this system call
> > through seccomp without forcing existing userspace confinement environments
> > to be changed.
>
> This makes me feel kinda uncomfortable. The purpose of seccomp() is
> that you can create a process that is as locked down as you want; you
> can use it for some light limits on what a process can do (like in
> Docker), or you can use it to make a process that has access to
> essentially nothing except read(), write() and exit_group(). Even
> stuff like restart_syscall() and rt_sigreturn() is not currently
> excepted from that.
>
> I guess your usecase is a little special in that you were already
> calling from userspace into the kernel with SWBP before, which is also
> not subject to seccomp; and the syscall is essentially an
> arch-specific hack to make the SWBP a little faster.
>
> If we do this, we should at least ensure that there is absolutely no
> way for anything to happen in sys_uretprobe when no uretprobes are
> configured for the process - the first check in the syscall
> implementation almost does that, but the implementation could be a bit
> stricter. It checks for "regs->ip != trampoline_check_ip()", but if no
> uprobe region exists for the process, trampoline_check_ip() returns
> `-1 + (uretprobe_syscall_check - uretprobe_trampoline_entry)`. So
> there is a userspace instruction pointer near the bottom of the
> address space that is allowed to call into the syscall if uretprobes
> are not set up. Though the mmap minimum address restrictions will
> typically prevent creating mappings there, and
> uprobe_handle_trampoline() will SIGILL us if we get that far without a
> valid uretprobe.
nice catch, I think change below should fix that
thanks,
jirka
---
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 0c74a4d4df65..9b8837d8f06e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -368,19 +368,21 @@ void *arch_uretprobe_trampoline(unsigned long *psize)
return &insn;
}
-static unsigned long trampoline_check_ip(void)
+static unsigned long trampoline_check_ip(unsigned long tramp)
{
- unsigned long tramp = uprobe_get_trampoline_vaddr();
-
return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
}
SYSCALL_DEFINE0(uretprobe)
{
struct pt_regs *regs = task_pt_regs(current);
- unsigned long err, ip, sp, r11_cx_ax[3];
+ unsigned long err, ip, sp, r11_cx_ax[3], tramp;
+
+ tramp = uprobe_get_trampoline_vaddr();
+ if (tramp == -1)
+ goto sigill;
- if (regs->ip != trampoline_check_ip())
+ if (regs->ip != trampoline_check_ip(tramp))
goto sigill;
err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));