Re: [RFC PATCH v2 0/3] seccomp: SECCOMP_IOCTL_NOTIF_INJECT for race-free unotify

From: Cong Wang

Date: Tue May 26 2026 - 14:11:02 EST


Hi,

On Thu, May 14, 2026 at 9:27 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> From: Cong Wang <cwang@xxxxxxxxxxxxxx>
>
> This is a complete rework of v1 (PIN_ARGS), reshaped to address the
> review feedback that having every syscall-arg fetch site consult a
> per-task pin pointer is cross-cutting awareness that does not scale.
>
> v1 thread:
> https://lore.kernel.org/lkml/20260504011207.539408-1-xiyou.wangcong@xxxxxxxxx
>
> ## Changes since v1
>
> The previous proposal (SECCOMP_IOCTL_NOTIF_PIN_ARGS) snapshotted
> pointer-arg payloads into kernel buffers and modified four syscall
> fetch sites (getname_flags in fs/namei.c, copy_strings in fs/exec.c,
> move_addr_to_kernel in net/socket.c, import_ubuf in lib/iov_iter.c
> plus new_sync_read/new_sync_write in fs/read_write.c) so the resumed
> syscall body would consume from the snapshot instead of re-reading
> user memory. The reviewer correctly pointed out that this spreads
> "continue-from-snapshotted-state" awareness across the VFS and the
> kernel in general, and that the right shape for this kind of feature
> is one where the syscall layer does not have to care.
>
> v2 inverts the model. The supervisor no longer pins args for a
> resumed syscall body to consume; it describes a substitute syscall
> (nr + args[6]) whose pointer-shaped args are encoded as byte offsets
> into a kernel-side buffer. On SECCOMP_USER_NOTIF_FLAG_INJECTED, the
> trapped task wakes inside seccomp_do_user_notification(), dispatches
> into a kernel-mode syscall helper (filp_open / kernel_bind /
> kernel_write for v1), and the helper's return value becomes the
> trapped syscall's return value. The trapped task's user mm is never
> re-read for the substituted syscall.

Please let me know your thoughts on this v2 design. I would like to
get feedback before removing the RFC tag.

Thanks!