Re: [RFC PATCH v3 2/3] seccomp: add kernel-installed pinned-memfd redirect
From: Andy Lutomirski
Date: Wed Jun 24 2026 - 16:15:32 EST
On Wed, Jun 24, 2026 at 1:11 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 23, 2026 at 8:39 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 23, 2026 at 3:21 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > The implementation of the patch is a bit sad -- there is no technical
> > > reason that the code couldn't actually issue the syscall and avoid all
> > > the heap and task-work crud. The kernel isn't really structured like
> > > that, but I don't think it would be a big departure.
> >
> > You are right. I have redesigned it a bit again.
> >
> > SEND_REDIRECT is replaced with SECCOMP_IOCTL_NOTIF_RUN:
> > instead of rewriting the trapped task's argument registers and resuming
> > the same syscall, the supervisor names an explicit {nr, args}, and seccomp
> > issues *that* syscall in the target's context, reports its return value as the
> > trapped syscall's result, and skips the original. It runs in the target
> > (its mm/creds/fds), just dispatched into a scratch pt_regs frame so the
> > task's real registers are never touched.
>
> What is a "scratch pt_regs frame"? What happens to any code in the
> kernel that touches current_pt_regs()?
>
> Even if you tried to make a list of syscalls that don't inherently
> mess with current_pt_regs() or other task state that might be
> relevant, what happens if there's a tracer that expects
> syscall_get_args and such to work?
Don't forget about in_x32_syscall(), which is extremely relevant if
you allow changing nr and don't reflect it in pt_regs.
>
> >
> > Because nothing is rewritten, there is nothing to restore, the
> > task_work/heap fixup is deleted, and with it the entire signal corner
> > cases.
>
> Really? What happens to -ERESTARTxyz?
>
> >
> > Please let me know if this looks better for you too.
> >
> > Regards,
> > Cong
>
>
>
>