Re: [PATCH -next 1/2] mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust
From: Andy Lutomirski
Date: Wed Nov 21 2018 - 22:25:46 EST
On Wed, Nov 21, 2018 at 6:27 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, 20 Nov 2018 13:13:35 -0800 Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
> > > > I am Ok with whatever Andrew wants to do, if it is better to squash it with
> > > > the original, then I can do that and send another patch.
> > > >
> > > >
> > >
> > > From experience, Andrew will food in fixups on request :)
> >
> > Andrew, could you squash this patch into the one titled ("mm: Add an
> > F_SEAL_FUTURE_WRITE seal to memfd")?
>
> Sure.
>
> I could of course queue them separately but I rarely do so - I don't
> think that the intermediate development states are useful in the
> infinite-term, and I make them available via additional Link: tags in
> the changelog footers anyway.
>
> I think that the magnitude of these patches is such that John Stultz's
> Reviewed-by is invalidated, so this series is now in the "unreviewed"
> state.
>
> So can we have a re-review please? For convenience, here's the
> folded-together [1/1] patch, as it will go to Linus.
>
>
> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> Subject: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd
>
> Android uses ashmem for sharing memory regions. We are looking forward to
> migrating all usecases of ashmem to memfd so that we can possibly remove
> the ashmem driver in the future from staging while also benefiting from
> using memfd and contributing to it. Note staging drivers are also not ABI
> and generally can be removed at anytime.
>
> One of the main usecases Android has is the ability to create a region and
> mmap it as writeable, then add protection against making any "future"
> writes while keeping the existing already mmap'ed writeable-region active.
> This allows us to implement a usecase where receivers of the shared
> memory buffer can get a read-only view, while the sender continues to
> write to the buffer. See CursorWindow documentation in Android for more
> details:
> https://developer.android.com/reference/android/database/CursorWindow
>
> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
> which prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:
>
> #include <stdio.h>
> #include <errno.h>
> #include <sys/mman.h>
> #include <linux/memfd.h>
> #include <linux/fcntl.h>
> #include <asm/unistd.h>
> #include <unistd.h>
> #define F_SEAL_FUTURE_WRITE 0x0010
> #define REGION_SIZE (5 * 1024 * 1024)
>
> int memfd_create_region(const char *name, size_t size)
> {
> int ret;
> int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
> if (fd < 0) return fd;
> ret = ftruncate(fd, size);
> if (ret < 0) { close(fd); return ret; }
> return fd;
> }
>
> int main() {
> int ret, fd;
> void *addr, *addr2, *addr3, *addr1;
> ret = memfd_create_region("test_region", REGION_SIZE);
> printf("ret=%d\n", ret);
> fd = ret;
>
> // Create map
> addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> if (addr == MAP_FAILED)
> printf("map 0 failed\n");
> else
> printf("map 0 passed\n");
>
> if ((ret = write(fd, "test", 4)) != 4)
> printf("write failed even though no future-write seal "
> "(ret=%d errno =%d)\n", ret, errno);
> else
> printf("write passed\n");
>
> addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> if (addr1 == MAP_FAILED)
> perror("map 1 prot-write failed even though no seal\n");
> else
> printf("map 1 prot-write passed as expected\n");
>
> ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
> F_SEAL_GROW |
> F_SEAL_SHRINK);
> if (ret == -1)
> printf("fcntl failed, errno: %d\n", errno);
> else
> printf("future-write seal now active\n");
>
> if ((ret = write(fd, "test", 4)) != 4)
> printf("write failed as expected due to future-write seal\n");
> else
> printf("write passed (unexpected)\n");
>
> addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> if (addr2 == MAP_FAILED)
> perror("map 2 prot-write failed as expected due to seal\n");
> else
> printf("map 2 passed\n");
>
> addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
> if (addr3 == MAP_FAILED)
> perror("map 3 failed\n");
> else
> printf("map 3 prot-read passed as expected\n");
> }
>
> The output of running this program is as follows:
> ret=3
> map 0 passed
> write passed
> map 1 prot-write passed as expected
> future-write seal now active
> write failed as expected due to future-write seal
> map 2 prot-write failed as expected due to seal
> : Permission denied
> map 3 prot-read passed as expected
>
> [joel@xxxxxxxxxxxxxxxxx: make F_SEAL_FUTURE_WRITE seal more robust]
> Link: http://lkml.kernel.org/r/20181120052137.74317-1-joel@xxxxxxxxxxxxxxxxx
> Link: http://lkml.kernel.org/r/20181108041537.39694-1-joel@xxxxxxxxxxxxxxxxx
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> Cc: John Stultz <john.stultz@xxxxxxxxxx>
> Cc: John Reck <jreck@xxxxxxxxxx>
> Cc: Todd Kjos <tkjos@xxxxxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Cc: Daniel Colascione <dancol@xxxxxxxxxx>
> Cc: J. Bruce Fields <bfields@xxxxxxxxxxxx>
> Cc: Jeff Layton <jlayton@xxxxxxxxxx>
> Cc: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
> Cc: Lei Yang <Lei.Yang@xxxxxxxxxxxxx>
> Cc: Marc-Andr Lureau <marcandre.lureau@xxxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Shuah Khan <shuah@xxxxxxxxxx>
> Cc: Valdis Kletnieks <valdis.kletnieks@xxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
>
> --- a/include/uapi/linux/fcntl.h~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/include/uapi/linux/fcntl.h
> @@ -41,6 +41,7 @@
> #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
> #define F_SEAL_GROW 0x0004 /* prevent file from growing */
> #define F_SEAL_WRITE 0x0008 /* prevent writes */
> +#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
> /* (1U << 31) is reserved for signed error codes */
>
> /*
> --- a/mm/memfd.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/mm/memfd.c
> @@ -131,7 +131,8 @@ static unsigned int *memfd_file_seals_pt
> #define F_ALL_SEALS (F_SEAL_SEAL | \
> F_SEAL_SHRINK | \
> F_SEAL_GROW | \
> - F_SEAL_WRITE)
> + F_SEAL_WRITE | \
> + F_SEAL_FUTURE_WRITE)
>
> static int memfd_add_seals(struct file *file, unsigned int seals)
> {
> --- a/fs/hugetlbfs/inode.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/fs/hugetlbfs/inode.c
> @@ -530,7 +530,7 @@ static long hugetlbfs_punch_hole(struct
> inode_lock(inode);
>
> /* protected by i_mutex */
> - if (info->seals & F_SEAL_WRITE) {
> + if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) {
> inode_unlock(inode);
> return -EPERM;
> }
> --- a/mm/shmem.c~mm-add-an-f_seal_future_write-seal-to-memfd
> +++ a/mm/shmem.c
> @@ -2119,6 +2119,23 @@ out_nomem:
>
> static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
> {
> + struct shmem_inode_info *info = SHMEM_I(file_inode(file));
> +
> + /*
> + * New PROT_READ and MAP_SHARED mmaps are not allowed when "future
PROT_WRITE, perhaps?
> + * write" seal active.
> + */
> + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE) &&
> + (info->seals & F_SEAL_FUTURE_WRITE))
> + return -EPERM;
> +
> + /*
> + * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED read-only
> + * mapping, take care to not allow mprotect to revert protections.
> + */
> + if (info->seals & F_SEAL_FUTURE_WRITE)
> + vma->vm_flags &= ~(VM_MAYWRITE);
> +
This might all be clearer as:
if (info->seals & F_SEAL_FUTURE_WRITE) {
if (vma->vm_flags ...)
return -EPERM;
vma->vm_flags &= ~VM_MAYWRITE;
}
with appropriate comments inserted.