Re: [PATCH v5 1/2] mm/process_vm_access: pidfd and nowait support for process_vm_readv/writev
From: David Hildenbrand (Arm)
Date: Tue Jun 02 2026 - 08:23:26 EST
On 6/2/26 12:09, Alban Crequy wrote:
> From: Alban Crequy <albancrequy@xxxxxxxxxxxxx>
>
> There are two categories of users for process_vm_readv:
>
> 1. Debuggers like GDB or strace.
>
> When a debugger attempts to read the target memory and triggers a
> page fault, the page fault needs to be resolved so that the debugger
> can accurately interpret the memory. A debugger is typically attached
> to a single process.
>
> 2. Profilers like OpenTelemetry eBPF Profiler.
>
> The profiler uses a perf event to get stack traces from all
> processes at 20Hz (20 stack traces to resolve per second). For
> interpreted languages (Ruby, Python, etc.), the profiler uses
> process_vm_readv to get the correct symbols. In this case,
> performance is the most important. It is fine if some stack traces
> cannot be resolved as long as it is not statistically significant.
>
> The current behaviour of process_vm_readv is to resolve page faults in
> the target VM. This is as desired for debuggers, but unwelcome for
> profilers because the page fault resolution could take a lot of time
> depending on the backing filesystem. Additionally, since profilers
> monitor all processes, we don't want a slow page fault resolution for
> one target process slowing down the monitoring for all other target
> processes.
>
> This patch adds the flag PROCESS_VM_NOWAIT, so the caller can choose to
> not block on IO if the memory access causes a page fault. When a page
> is not resident and would require IO to fault in, the syscall returns
> a short read (the number of bytes successfully read before the fault)
> or -1 with errno set to EFAULT if no bytes were read.
>
> Additionally, this patch adds the flag PROCESS_VM_PIDFD to refer to the
> remote process via PID file descriptor instead of PID. Such a file
> descriptor can be obtained with pidfd_open(2). This is useful to avoid
> the pid number being reused. It is unlikely to happen for debuggers
> because they can monitor the target process termination in other ways
> (ptrace), but can be helpful in some profiling scenarios. When using
> PROCESS_VM_PIDFD, the first argument is a pidfd instead of a pid. If
> the pidfd is invalid, the syscall returns -1 with errno set to EBADF.
>
> If a given flag is unsupported, the syscall returns the error EINVAL
> without checking the buffers. This gives a way to userspace to detect
> whether the current kernel supports a specific flag:
>
> process_vm_readv(pid, NULL, 1, NULL, 1, PROCESS_VM_PIDFD)
> -> EINVAL if the kernel does not support the flag PROCESS_VM_PIDFD
> (before this patch)
> -> EFAULT if the kernel supports the flag (after this patch)
>
> Suggested man page update for process_vm_readv(2):
>
> The flags argument is the bitwise OR of zero or more of these flags:
>
> PROCESS_VM_PIDFD (since Linux 7.x)
> The pid argument is a PID file descriptor (see pidfd_open(2))
> instead of a PID number. When using this flag, the existing
> ESRCH error applies if the process referred to by the pidfd
> has exited.
>
> PROCESS_VM_NOWAIT (since Linux 7.x)
> Do not block on IO. If a page in the remote address space is not
> resident and would require disk IO to fault in, the system call
> returns a short read or fails with EFAULT if no bytes were read.
>
> Additional error:
>
> EBADF pid is not a valid file descriptor (PROCESS_VM_PIDFD only).
>
> Signed-off-by: Alban Crequy <albancrequy@xxxxxxxxxxxxx>
> ---
Nothing jumped at me, thanks!
Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
--
Cheers,
David