Re: [RFC PATCH RESEND v2 1/2] mm/memfd: Add support for F_SEAL_FUTURE_EXEC to memfd
From: Lorenzo Stoakes
Date: Wed Jan 08 2025 - 15:45:28 EST
On Mon, Jan 06, 2025 at 05:14:26PM -0800, Isaac Manjarres wrote:
> On Fri, Jan 03, 2025 at 04:03:44PM +0100, Jann Horn wrote:
> > On Fri, Jan 3, 2025 at 12:32 AM Isaac J. Manjarres
> > <isaacmanjarres@xxxxxxxxxx> wrote:
> > > Android currently uses the ashmem driver [1] for creating shared memory
> > > regions between processes. Ashmem buffers can initially be mapped with
> > > PROT_READ, PROT_WRITE, and PROT_EXEC. Processes can then use the
> > > ASHMEM_SET_PROT_MASK ioctl command to restrict--never add--the
> > > permissions that the buffer can be mapped with.
> > >
> > > Processes can remove the ability to map ashmem buffers as executable to
> > > ensure that those buffers cannot be exploited to run unintended code.
> >
> > Is there really code out there that first maps an ashmem buffer with
> > PROT_EXEC, then uses the ioctl to remove execute permission for future
> > mappings? I don't see why anyone would do that.
>
> Hi Jann,
>
> Thanks for your feedback and for taking the time to review these
> patches!
>
> Not that I'm aware of. The reason why I made this seal have semantics
> where it prevents future executable mappings is because there are
> existing applications that allocate an ashmem buffer (default
> permissions are RWX), map the buffer as RW, and then restrict
> the permissions to just R.
>
> When the buffer is mapped as RW, do_mmap() unconditionally sets
> VM_MAYEXEC on the VMA for the mapping, which means that the mapping
> could later be mapped as executable via mprotect(). Therefore, having
> the semantics of the seal be that it prevents any executable mappings
> would break existing code that has already been released. It would
> make transitioning clients to memfd difficult, because to amend that,
> the ashmem users would have to first restrict the permissions of the
> buffer to be RW, then map it as RW, and then restrict the permissions
> again to be just R, which also means an additional system call.
You could do something similar to my adjustments to the F_SEAL_WRITE
changes that clears VM_MAYEXEC in cases where do_mmap() maps an
F_SEAL_EXEC'd without PROT_EXEC.
Please note that F_SEAL_EXEC implies:
F_SEAL_SHRINK
F_SEAL_GROW
F_SEAL_WRITE <- important, obviously
F_SEAL_FUTURE_WRITE <- also important
if (seals & F_SEAL_EXEC && inode->i_mode & 0111)
seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
Though interestingly only _after_ the mapping_deny_writable() call is
performed which is... odd.
Probably worth exploring F_SEAL_EXEC semantics in detail if you haven't
already to see how viable this is.
>
> > > For instance, suppose process A allocates a memfd that is meant to be
> > > read and written by itself and another process, call it B.
> > >
> > > Process A shares the buffer with process B, but process B injects code
> > > into the buffer, and compromises process A, such that it makes A map
> > > the buffer with PROT_EXEC. This provides an opportunity for process A
> > > to run the code that process B injected into the buffer.
> > >
> > > If process A had the ability to seal the buffer against future
> > > executable mappings before sharing the buffer with process B, this
> > > attack would not be possible.
> >
> > I think if you want to enforce such restrictions in a scenario where
> > the attacker can already make the target process perform
> > semi-arbitrary syscalls, it would probably be more reliable to enforce
> > rules on executable mappings with something like SELinux policy and/or
> > F_SEAL_EXEC.
> >
>
> For SELinux policy, do you mean to not allow execmem permissions? What
> about scenarios where a process wants to use JIT compilation, but
> doesn't want memfd data buffers to be executable? My thought was to use
> this new seal to have a finer granularity to control what buffers can
> be mapped as executable. If not, could you please clarify?
>
> Also, F_SEAL_EXEC just seals the memfd's current executable permissions,
> and doesn't affect the mapping permissions at all. Are you saying that
> F_SEAL_EXEC should be extended to cover mappings as well? If so, it is
> not clear to me what the semantics of that would be.
I need to dig into how this functions, but I'm guessing then it doesn't
immediatley enforce anything preventing an existing mapping from executing?
Which differs from F_SEAL_WRITE semantics and then makes it seem like it is
already acting like an F_SEAL_FUTURE_EXEC in effect?
Hm need to dig into this a bit.
>
> For instance, if a memfd is non-executable and F_SEAL_EXEC is applied,
> we can also prevent any executable mappings at that point. I'm not sure
> if that's the right thing to do though. For instance, there are shared
> object files that don't have executable permissions, but their code
> sections should be mapped as executable. So, drawing from that, I'm not
> sure if it makes sense to tie the file execution permissions to the
> mapping permissions.
>
> There's also the case where F_SEAL_EXEC is invoked on an executable
> memfd. In that case, there doesn't seem to be anything to do from a
> mapping perspective since memfds can be mapped as executable by
> default?
>
> > > Android is currently trying to replace ashmem with memfd. However, memfd
> > > does not have a provision to permanently remove the ability to map a
> > > buffer as executable, and leaves itself open to the type of attack
> > > described earlier. However, this should be something that can be
> > > achieved via a new file seal.
> > >
> > > There are known usecases (e.g. CursorWindow [2]) where a process
> > > maps a buffer with read/write permissions before restricting the buffer
> > > to being mapped as read-only for future mappings.
> >
> > Here you're talking about write permission, but the patch is about
> > execute permission?
> >
>
> Sorry, I used this example about write permission to show why I implemented
> the seal with support for preventing future mappings, since the writable
> mappings that get created can become executable in the future, as
> described later in the commit text.
>
> > > The resulting VMA from the writable mapping has VM_MAYEXEC set, meaning
> > > that mprotect() can change the mapping to be executable. Therefore,
> > > implementing the seal similar to F_SEAL_WRITE would not be appropriate,
> > > since it would not work with the CursorWindow usecase. This is because
> > > the CursorWindow process restricts the mapping permissions to read-only
> > > after the writable mapping is created. So, adding a file seal for
> > > executable mappings that operates like F_SEAL_WRITE would fail.
> > >
> > > Therefore, add support for F_SEAL_FUTURE_EXEC, which is handled
> > > similarly to F_SEAL_FUTURE_WRITE. This ensures that CursorWindow can
> > > continue to create a writable mapping initially, and then restrict the
> > > permissions on the buffer to be mappable as read-only by using both
> > > F_SEAL_FUTURE_WRITE and F_SEAL_FUTURE_EXEC. After the seal is
> > > applied, any calls to mmap() with PROT_EXEC will fail.
> > >
> > > [1] https://cs.android.com/android/kernel/superproject/+/common-android-mainline:common/drivers/staging/android/ashmem.c
> > > [2] https://developer.android.com/reference/android/database/CursorWindow
> > >
> > > Signed-off-by: Isaac J. Manjarres <isaacmanjarres@xxxxxxxxxx>
> > > ---
> > > include/uapi/linux/fcntl.h | 1 +
> > > mm/memfd.c | 39 +++++++++++++++++++++++++++++++++++++-
> > > 2 files changed, 39 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > index 6e6907e63bfc..ef066e524777 100644
> > > --- a/include/uapi/linux/fcntl.h
> > > +++ b/include/uapi/linux/fcntl.h
> > > @@ -49,6 +49,7 @@
> > > #define F_SEAL_WRITE 0x0008 /* prevent writes */
> > > #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
> > > #define F_SEAL_EXEC 0x0020 /* prevent chmod modifying exec bits */
> > > +#define F_SEAL_FUTURE_EXEC 0x0040 /* prevent future executable mappings */
> > > /* (1U << 31) is reserved for signed error codes */
> > >
> > > /*
> > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > index 5f5a23c9051d..cfd62454df5e 100644
> > > --- a/mm/memfd.c
> > > +++ b/mm/memfd.c
> > > @@ -184,6 +184,7 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
> > > }
> > >
> > > #define F_ALL_SEALS (F_SEAL_SEAL | \
> > > + F_SEAL_FUTURE_EXEC |\
> > > F_SEAL_EXEC | \
> > > F_SEAL_SHRINK | \
> > > F_SEAL_GROW | \
> > > @@ -357,14 +358,50 @@ static int check_write_seal(unsigned long *vm_flags_ptr)
> > > return 0;
> > > }
> > >
> > > +static inline bool is_exec_sealed(unsigned int seals)
> > > +{
> > > + return seals & F_SEAL_FUTURE_EXEC;
> > > +}
> > > +
> > > +static int check_exec_seal(unsigned long *vm_flags_ptr)
> > > +{
> > > + unsigned long vm_flags = *vm_flags_ptr;
> > > + unsigned long mask = vm_flags & (VM_SHARED | VM_EXEC);
> > > +
> > > + /* Executability is not a concern for private mappings. */
> > > + if (!(mask & VM_SHARED))
> > > + return 0;
> >
> > Why is it not a concern for private mappings?
> >
> I didn't consider private mappings since it wasn't clear as to how
> they could be a threat to another process. A process can copy the
> contents of the buffer into another executable region of memory
> and just run it from there? Or are you saying that because it
> can do that, is there any value in differentiating between
> shared and private mappings?
Yes this is my point of view also but I might be missing something.
>
> Thanks,
> Isaac