Re: [PATCH v1] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING`

From: Andrew Morton
Date: Thu May 23 2024 - 15:45:35 EST


On Wed, 22 May 2024 19:32:35 -0700 Jeff Xu <jeffxu@xxxxxxxxxx> wrote:

> >
> > It's a change to a userspace API, yes? Please let's have a detailed
> > description of why this is OK. Why it won't affect any existing users.
> >
> Unfortunately, this is a breaking change that might break a
> application if they do below:
> memfd_create("", MFD_NOEXEC_SEAL)
> fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new
> semantics, due to mfd not being sealable.
>
> However, I still think the new semantics is a better, the reason is
> due to the sysctl: memfd_noexec_scope
> Currently, when the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
> kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd becomes sealable.
> E.g.
> When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
> The app calls memfd_create("",0)
> application will get sealable memfd, which might be a surprise to application.
>
> If the app doesn't want this behavior, they will need one of two below
> in current implementation.
> 1>
> set the sysctl: memfd_noexec_scope to 0.
> So the kernel doesn't overwrite the mdmfd_create
>
> 2>
> modify their code to get non-sealable NOEXEC memfd.
> memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC)
> fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL)
>
> The new semantics works better with the sysctl.
>
> Since memfd noexec is new, maybe there is no application using the
> MFD_NOEXEC_SEAL to create
> sealable memfd. They mostly likely use
> memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead.
> I think it might benefit in the long term with the new semantics.

Yes, it's new so I expect any damage will be small. Please prepare a
v2 which fully explains/justifies the thinking for this
non-backward-compatible change and which include the cc:stable.