Re: [RFC PATCH v1 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd

From: Isaac Manjarres
Date: Fri Dec 06 2024 - 15:48:25 EST


On Fri, Dec 06, 2024 at 06:19:49PM +0000, Lorenzo Stoakes wrote:
> On Thu, Dec 05, 2024 at 05:09:22PM -0800, Isaac J. Manjarres wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index b1b2a24ef82e..c7b96b057fda 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -375,6 +375,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > if (!file_mmap_ok(file, inode, pgoff, len))
> > return -EOVERFLOW;
> >
>
> Not maybe in favour of _where_ in the logic we check this and definitely
> not in expanding this do_mmap() stuff much further.
>
> See comment at bottom though... I have a cunning plan :)
>
> > + if (is_exec_sealed(seals)) {
>
> Are we intentionally disallowing a MAP_PRIVATE memfd's mapping's execution?
> I've not tested this scenario so don't know if we somehow disallow this in
> another way but note on write checks we only care about shared mappings.
>
> I mean one could argue that a MAP_PRIVATE situation is the same as copying
> the data into an anon buffer and doing what you want with it, here you
> could argue the same...
>
> So probably we should only care about VM_SHARED?

Thanks for taking a look at this!

I'd originally implemented it for just the VM_SHARED case, but after
discussing it with Kalesh, I changed it to disallow executable
mappings for both MAP_SHARED and MAP_PRIVATE.

Our thought was that write sealing didn't apply in the MAP_PRIVATE case
to support COW with MAP_PRIVATE. There's nothing similar to COW with
execution, so I decided to prevent it for both cases; it also retains
the same behavior as the ashmem driver.

> > + /* No new executable mappings if the file is exec sealed. */
> > + if (prot & PROT_EXEC)
>
> Seems strange to reference a prot flag rather than vma flag, we should have
> that set up by now.

That makes sense. I can change this to check for VM_EXEC in v2 of this
series.
> > + return -EACCES;
> > + /*
> > + * Prevent an initially non-executable mapping from
> > + * later becoming executable via mprotect().
> > + */
> > + vm_flags &= ~VM_MAYEXEC;
> > + }
> > +
>
> You know, I'm in two minds about this... I explicitly moved logic to
> do_mmap() in [0] to workaround a chicken-and-egg scenario with having
> accidentally undone the ability to mmap() read-only F_WRITE_SEALed
> mappings, which meant I 'may as well' move the 'future proofing' clearing
> of VM_MAYWRITE for F_SEAL_FUTURE_WRITE too.
>
> But now I feel that the use of shmem_mmap() and hugetlbfs_file_mmap() to do
> _any_ of this is pretty odious in general, we may as well do it all
> upfront.
>
> [0]:https://lore.kernel.org/all/cover.1732804776.git.lorenzo.stoakes@xxxxxxxxxx/

I agree. I really like the idea of handling the future proofing and
error checking in one place. It makes understanding how these seals
work a lot easier, and has the added benefit of only worrying about
the check once rather than having to duplicate it in both shmem_mmap() and
hugetlbfs_file_mmap().

> > flags_mask = LEGACY_MAP_MASK;
> > if (file->f_op->fop_flags & FOP_MMAP_SYNC)
> > flags_mask |= MAP_SYNC;
> > --
> > 2.47.0.338.g60cca15819-goog
> >
>
> So actually - overall - could you hold off a bit on this until I've had a
> think and can perhaps send a patch that refactors this?
>
> Then your patch can build on top of that one and we can handle this all in
> one place and sanely :)
>
> Sorry you've kicked off thought processes here and that's often a dangerous
> thing :P
Thanks again for reviewing these patches! Happy that I was able to get
the gears turning :)

I'm really interested in helping with this, so is there any forum you'd
like to use for collaborating on this or any way I can help?

I'm also more than happy to test out any patches that you'd like!

Thanks,
Isaac