Re: [PATCH v4 2/2] fanotify: allow reporting pidfds for reaped tasks
From: AnonymeMeow
Date: Thu Jun 04 2026 - 16:31:51 EST
On 2026-06-03 13:25 +0200, Christian Brauner wrote:
> On Wed, Jun 03, 2026 at 12:58:09PM +0200, Amir Goldstein wrote:
> > On Wed, Jun 3, 2026 at 12:28 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Jun 03, 2026 at 12:00:42PM +0200, Amir Goldstein wrote:
> > > > On Wed, Jun 3, 2026 at 5:11 AM AnonymeMeow <anonymemeow@xxxxxxxxx> wrote:
> > > > >
> > > > > Sashiko said that pidfs_register_pid() can fail under memory pressure,
> > > > > since it allocates memory using GFP_KERNEL without a way to pass in
> > > > > __GFP_NOFAIL. This can cause fanotify_alloc_event() to silently drop
> > > > > the event even when the group has an unlimited queue length.
> > > > >
> > > > > Should we make pidfs registration best-effort instead and still return
> > > > > the allocated event if pidfs_register_pid() fails? But IMO this will
> > > > > make the API semantically ambiguous. If the kernel guarantees that the
> > > > > event pid is registered with pidfs when pidfd reporting is requested,
> > > > > then a later FAN_NOPIDFD or -ESRCH can only be caused by the task being
> > > > > invisible within the reader's pid namespace. But if we allow best-effort
> > > > > registration, the event may still carry a pinned struct pid, but pidfs
> > > > > registration may have failed due to memory pressure. If userspace reads
> > > > > the event after the task has been reaped, it still gets a FAN_NOPIDFD
> > > > > or -ESRCH due to failed pidfs registration caused by memory pressure,
> > > > > IMO this should be reported as a -ENOMEM instead of -ESRCH. But
> > > > > currently it seems we don't have a good way to register the pid with
> > > > > __GFP_NOFAIL or __GFP_RETRY_MAYFAIL, so what would be the preferred
> > > > > approach to handle this situation?
> > > >
> > > > Thanks for noticing the oddity and explaining the options.
> > > >
> > > > We could fail the event in case on ENOMEM because the system
> > > > is likely is a bad shape anyway and other events are likely to fail
> > > > allocation themselves, so I am not objecting to your patch as is,
> > > >
> > > > but I am slightly leaning towards the semantically ambiguous API.
> > >
> > > As usual, I think a semantically ambiguous api should be avoided.
> > >
> > > > I don't think that we are going to change the man page to say that
> > > > FAN_REPORT_PIDFD is guaranteed to return a pidfd, maybe we just
> > > > add a NOTE about increased likelihood of getting a pidfd since kernel XXX
> > > > and mention the known cases of pid namespace and ENOMEM.
> > >
> > > The pid namespace shouldn't matter.
> >
> > DO you mean that the event reader inside a pidns always gets a pidfd but
> > it will not be able to query the pid of the process outside the pidns
> > or do you mean something else?
>
> So they will always get a pidfd since fanotify is correctly using
> pidfd_prepare() directly. But when the reader tries to resolve the pidfd
> to say a pid or retrieve info via the pidfd info ioctl they would fail
> to do so (well, the pid will resolve to 0 which is the global indicator
> for "can't be resolved in your pidns").
>
Before pidfd_prepare(), fanotify checks whether metadata.pid is
non-zero, i.e. whether the event pid is visible in the reader's pidns.
If it's not visible, fanotify will not create a valid pidfd and return a
FAN_NOPIDFD. This is the existing behavior, and my patches do not change
it.