Re: [PATCH v4 2/3] kernfs: Send IN_DELETE_SELF and IN_IGNORED
From: Amir Goldstein
Date: Fri Feb 20 2026 - 12:16:48 EST
On Fri, Feb 20, 2026 at 4:32 PM Tejun Heo <tj@xxxxxxxxxx> wrote:
>
> Hello,
>
> On Thu, Feb 19, 2026 at 09:54:47PM -0800, T.J. Mercier wrote:
> > Currently some kernfs files (e.g. cgroup.events, memory.events) support
> > inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > removed. This means inotify watches persist after file deletion until
> > the process exits and the inotify file descriptor is cleaned up, or
> > until inotify_rm_watch is called manually.
> >
> > This creates a problem for processes monitoring cgroups. For example, a
> > service monitoring memory.events for memory.high breaches needs to know
> > when a cgroup is removed to clean up its state. Where it's known that a
> > cgroup is removed when all processes die, without IN_DELETE_SELF the
> > service must resort to inefficient workarounds such as:
> > 1) Periodically scanning procfs to detect process death (wastes CPU
> > and is susceptible to PID reuse).
> > 2) Holding a pidfd for every monitored cgroup (can exhaust file
> > descriptors).
> >
> > This patch enables IN_DELETE_SELF and IN_IGNORED events for kernfs files
> > and directories by clearing inode i_nlink values during removal. This
> > allows VFS to make the necessary fsnotify calls so that userspace
> > receives the inotify events.
> >
> > As a result, applications can rely on a single existing watch on a file
> > of interest (e.g. memory.events) to receive notifications for both
> > modifications and the eventual removal of the file, as well as automatic
> > watch descriptor cleanup, simplifying userspace logic and improving
> > efficiency.
> >
> > There is gap in this implementation for certain file removals due their
> > unique nature in kernfs. Directory removals that trigger file removals
> > occur through vfs_rmdir, which shrinks the dcache and emits fsnotify
> > events after the rmdir operation; there is no issue here. However kernfs
> > writes to particular files (e.g. cgroup.subtree_control) can also cause
> > file removal, but vfs_write does not attempt to emit fsnotify events
> > after the write operation, even if i_nlink counts are 0. As a usecase
> > for monitoring this category of file removals is not known, they are
> > left without having IN_DELETE or IN_DELETE_SELF events generated.
>
> Adding a comment with the above content would probably be useful. It also
> might be worthwhile to note that fanotify recursive monitoring wouldn't work
> reliably as cgroups can go away while inodes are not attached.
Sigh.. it's a shame to grow more weird semantics.
But I take this back to the POV of "remote" vs. "local" vfs notifications.
the IN_DELETE_SELF events added by this change are actually
"local" vfs notifications.
If we would want to support monitoring cgroups fs super block
for all added/removed cgroups with fanotify, we would be able
to implement this as "remote" notifications and in this case, adding
explicit fsnotify() calls could make sense.
Thanks,
Amir.