Re: [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child

From: Peter Xu
Date: Mon May 03 2021 - 18:41:11 EST


On Mon, May 03, 2021 at 03:10:04PM -0700, Mike Kravetz wrote:
> On 5/3/21 2:41 PM, Peter Xu wrote:
> > On Mon, May 03, 2021 at 01:53:03PM -0700, Mike Kravetz wrote:
> >> On 5/1/21 7:41 AM, Peter Xu wrote:
> >>> When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
> >>> needed, however we forget about the child! Without it, the child will be able
> >>> to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
> >>> which will cause data corruption in the parent process.
> >>>
> >>> This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
> >>> pass the F_SEAL_FUTURE_WRITE test first, though).
> >>>
> >>> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> >>> ---
> >>> mm/hugetlb.c | 2 ++
> >>> 1 file changed, 2 insertions(+)
> >>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> >
> > Thanks!
> >
> >>
> >> I think we need to add, "Fixes: 4eae4efa2c29" as this is now in v5.12
> >
> > I could be mistaken, but my understanding is it's broken from the most initial
> > cow support of hugetlbfs in 2006... So if we want a fixes tag, maybe this?
> >
> > Fixes: 1e8f889b10d8d ("[PATCH] Hugetlb: Copy on Write support")
> >
>
> Here is why I think it was broken in 4eae4efa2c29. Prior to that commit
> the code looked like this:
>
> if (cow) {
> /*
> * No need to notify as we are downgrading page
> * table protection not changing it to point
> * to a new page.
> *
> * See Documentation/vm/mmu_notifier.rst
> */
> huge_ptep_set_wrprotect(src, addr, src_pte);
> }
> entry = huge_ptep_get(src_pte);
> ptepage = pte_page(entry);
> get_page(ptepage);
> page_dup_rmap(ptepage, true);
> set_huge_pte_at(dst, addr, dst_pte, entry);
> hugetlb_count_add(pages_per_huge_page(h), dst);
>
> After setting the wrprotect in the source pte, we 'huge_ptep_get' the
> source to create the destination. Hence, wrprotect will be set in the
> destination as well. It is perhaps not the most efficient, but
> I think it 'works'.
>
> It is subtle, or am I missing something?

You're right, thanks Mike. I'll repost and add correct fixes tag.

--
Peter Xu