Re: [PATCH v2 1/2] mm/memfd: Refactor and cleanup the logic in memfd_create()

From: Lorenzo Stoakes
Date: Wed Jan 08 2025 - 15:24:20 EST


On Wed, Jan 08, 2025 at 12:04:40PM -0800, Isaac Manjarres wrote:

[ snipping stuff for brevity - thanks for taking a look at the other raised
issues! ]

> > > +{
> > > + int error;
> > > + char *name;
> > > + long len;
> > >
> > > /* length includes terminating zero */
> > > len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
> > > if (len <= 0)
> > > - return -EFAULT;
> > > + return ERR_PTR(-EFAULT);
> >
> > Not sure if this is necessary, I mean I guess technically... but feels like
> > it's adding a bunch of noise.
> >
> > I know you refactor this whole thing in the next commit so maybe to reduce
> > the size of this commit you could drop these changes here and keep the bare
> > minimum before you change the function again?
> >
>
> To make sure I understood correctly, do you mean that I should combine
> introducing alloc_name() and changing the handling for how the name is
> allocated in the next patch instead and just leave this logic alone
> in this patch?

It's not actually a huge big deal, I mean just drop the ERR_PTR() stuff as
you refactor again just after.

But this is optional!


> >
> > Then, as mentioned below, restore the original ordering of fd assignment in
> > the syscall, do so before file allocation, and install the file into the fd
> > afterwards.
> >
> > > +{
> > > + unsigned int *file_seals;
> > > + struct file *file;
> > > + int error;
> > > +
> > > + error = memfd_validate_flags(flags);
> > > + if (error < 0)
> > > + return ERR_PTR(error);
> >
> > I'm not actually sure why you put this here, it seems quite
> > arbitrary. Let's invoke this from the syscall so we neatly divide the logic
> > into each part rather than dividing into different parts one of which is
> > invoked by another.
>
> I put the flags validation here since the flags are mostly used in this
> function, so it made sense to me embed the flags validation into here.
> However, I do understand how splitting this out as it was before makes
> more sense now, especially since the flags can change. It seems odd
> for alloc_file() to have a side effect of changing the flags, so I
> will place the flags sanitization back to where it was.

Ack sure, I mean the original was horrid, so I get it.

But to me makes more sense to divide up the tasks into smaller functions
(again, I love the intent of this series, this is a very Lorenzo-style
thing to do so very en-vogue-Stoakes which I naturally appreciate :) - and
this is just another part that is a little simpler to separate out at the
top level.

It's not a huge thing but I think improves the code.

>
> >
> > And obviously make sure the original ordering is restored.
> >
> > >
> > > if (flags & MFD_HUGETLB) {
> > > file = hugetlb_file_setup(name, 0, VM_NORESERVE,
> > > @@ -433,10 +441,8 @@ SYSCALL_DEFINE2(memfd_create,
> > > MFD_HUGE_MASK);
> > > } else
> > > file = shmem_file_setup(name, 0, VM_NORESERVE);
> >
> > I know not to do with you, and strictly this is probably in line with
> > kernel code style, but this dangling else _kills_ me.
> >
> > Could we put { } around it? Risking invoking the ire of the strict
> > adherents of the coding style perhaps here :P
> >
>
> Absolutely!

I disclaim any responsibility for people moaning at you for doing this :P

>
> > > +
> > > + file = memfd_file_create(name, flags);
> > > + /* name is not needed beyond this point. */
> >
> > This is nice to highlight! Though it absolutely SUCKS to kmalloc() some
> > memory then instantly discard it like this. But I suppose we have no choice
> > here.
> >
>
> What if instead, we allocate a buffer on the stack and change
> alloc_name() to populate_name() and have it take in a pointer
> to that buffer and join the memfd name prefix and copy the
> name from userspace into that buffer? That avoids having to
> dynamically allocate a buffer that gets freed right away,
> and also gets rid of the cleanup, since that memory will be
> deallocated when we return from the function.

Ah no haha let's not do that, kernel stack space is in short supply. Just a
mild 'hmm'. Let's keep it heap allocated right now, max name size is 255
bytes so yeah.

>
> > > kfree(name);
> > > - return error;
> > > + if (IS_ERR(file))
> > > + return PTR_ERR(file);
> > > +
> > > + fd = get_unused_fd_flags((flags & MFD_CLOEXEC) ? O_CLOEXEC : 0);
> > > + if (fd >= 0)
> > > + fd_install(fd, file);
> > > + else
> > > + fput(file);
> >
> > You've changed the ordering of this again, and you're not doing anything to
> > free file if something goes wrong with fd allocation? That's a leak no?
> >
>
> file only has one reference to it at this point, and in the else branch,
> or failure case, fput() is invoked to drop the reference on file, which
> will cause it to be freed, so I don't think it's a leak, unless I missed
> something?

You're right, my mistake! Apologies.

>
> > Please reinstate the original ordering.
>
> I'm happy to reinstate the original ordering, but for my knowledge, is
> there a reason as to why fd allocation before file structure allocation
> is preferred instead of the other way around?

Generally I'd prefer us to keep the original ordering (where sane to do
so), as a user might, as insane as it sounds, rely upon a certain error
being returned first, that is literally check the error code for a specific
error, and do something based on that.

As silly as that might sound, this is now out in the wild, and once so we
should try to keep the behaviour as similar as possible.

I think there's some leeway to switching things up if it became so utterly
egregious not to do so, but if it's relatively straightforward we should try.

To me it makes more sense anyway to:

< check if we can reserve an fd >

< allocate the file >

< assign the fd to the file>

But this _may_ just be me... :)

>
> Thanks,
> Isaac
>