Re: [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE

From: Mike Kravetz
Date: Mon May 03 2021 - 14:57:14 EST


On 5/1/21 7:41 AM, Peter Xu wrote:
> F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
> There is a test program for that and it fails constantly.
>
> $ ./memfd_test hugetlbfs
> memfd-hugetlb: CREATE
> memfd-hugetlb: BASIC
> memfd-hugetlb: SEAL-WRITE
> memfd-hugetlb: SEAL-FUTURE-WRITE
> mmap() didn't fail as expected
> Aborted (core dumped)
>
> I think it's probably because no one is really running the hugetlbfs test.
>
> Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
> shmem_mmap(). Generalize a helper for that.
>
> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
> fs/hugetlbfs/inode.c | 5 +++++
> include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
> mm/shmem.c | 22 ++++------------------
> 3 files changed, 41 insertions(+), 18 deletions(-)

Thanks Peter and Hugh!

One question below,

>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index a2a42335e8fd2..39922c0f2fc8c 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
> static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> {
> struct inode *inode = file_inode(file);
> + struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
> loff_t len, vma_len;
> int ret;
> struct hstate *h = hstate_file(file);
>
> + ret = seal_check_future_write(info->seals, vma);
> + if (ret)
> + return ret;
> +
> /*
> * vma address alignment (but not the pgoff alignment) has
> * already been checked by prepare_hugepage_range. If you add

The full comment below the code you added is:

/*
* vma address alignment (but not the pgoff alignment) has
* already been checked by prepare_hugepage_range. If you add
* any error returns here, do so after setting VM_HUGETLB, so
* is_vm_hugetlb_page tests below unmap_region go the right
* way when do_mmap unwinds (may be important on powerpc
* and ia64).
*/

This comment was added in commit 68589bc35303 by Hugh, although it
appears David Gibson added the reason for the comment in the commit
message:

"If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it. That will eventually call down to the
non-hugepage version of unmap_page_range(). On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries. I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on."

There are still comments in the unmap code about special handling of
ppc64 PUDs. So, this may still be an issue.

I am trying to dig into the code to determine if this is still and
issue. Just curious if you looked into this? Might be simpler and
safer to just put the seal check after setting the VM_HUGETLB flag?

--
Mike Kravetz