Re: [REGRESSION] Re: [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches

From: Amir Goldstein
Date: Sun Feb 02 2025 - 02:46:56 EST


On Sun, Feb 2, 2025 at 1:58 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, 1 Feb 2025 at 06:38, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > Ok, but those "device fds" aren't really device fds in the sense that
> > they are character fds. They are regular files afaict from:
> >
> > vfio_device_open_file(struct vfio_device *device)
> >
> > (Well, it's actually worse as anon_inode_getfile() files don't have any
> > mode at all but that's beside the point.)?
> >
> > In any case, I think you're right that such files would (accidently?)
> > qualify for content watches afaict. So at least that should probably get
> > FMODE_NONOTIFY.
>
> Hmm. Can we just make all anon_inodes do that? I don't think you can
> sanely have pre-content watches on anon-inodes, since you can't really
> have access to them to _set_ the content watch from outside anyway..
>
> In fact, maybe do it in alloc_file_pseudo()?
>

The problem is that we cannot set FMODE_NONOTIFY -
we tried that once but it regressed some workloads watching
write on pipe fd or something.

and the no-pre-content is a flag combination (to save FMODE_ flags)
which makes things a bit messy.

We could try to initialize f_mode to FMODE_NONOTIFY_PERM
for anon_inode, which opts out of both permission and pre-content
events and leaves the legacy inotify workloads unaffected.

But, then code like this will not do the right thing:

/* We refuse fsnotify events on ptmx, since it's a shared resource */
filp->f_mode |= FMODE_NONOTIFY;

We will need to convert all those to use a helper.
I am traveling today so will be able to look closer tomorrow.

Jan,

What do you think?

Amir.