[PATCH v1] mm/hugetlb: don't map folios writable without VM_WRITE when copying during fork()

From: David Hildenbrand
Date: Wed Dec 04 2024 - 10:50:17 EST


If we have to trigger a hugetlb folio copy during fork() because the anon
folio might be pinned, we currently unconditionally create a writable
PTE.

However, the VMA might not have write permissions (VM_WRITE) at that
point.

Fix it by checking the VMA for VM_WRITE. Make the code less error prone
by moving checking for VM_WRITE into make_huge_pte(), and letting
callers only specify whether we should try making it writable.

A simple reproducer that longter-pins the folios using liburing to then
mprotect(PROT_READ) the folios befor fork() [1] results in:

Before:
[FAIL] access should not have worked

After:
[PASS] access did not work as expected

[1] https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/hugetlb-mkwrite-fork.c

This is rather a corner case, so stable might not be warranted.

Fixes: 4eae4efa2c29 ("hugetlb: do early cow when page pinned on src mm")
Cc: Muchun Song <muchun.song@xxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Xu <peterx@xxxxxxxxxx>
Cc: Guillaume Morin <guillaume@xxxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
mm/hugetlb.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5c8de0f5c760..6db4e8176303 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5141,12 +5141,12 @@ const struct vm_operations_struct hugetlb_vm_ops = {
};

static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
- int writable)
+ bool try_mkwrite)
{
pte_t entry;
unsigned int shift = huge_page_shift(hstate_vma(vma));

- if (writable) {
+ if (try_mkwrite && (vma->vm_flags & VM_WRITE)) {
entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page,
vma->vm_page_prot)));
} else {
@@ -5199,7 +5199,7 @@ static void
hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
struct folio *new_folio, pte_t old, unsigned long sz)
{
- pte_t newpte = make_huge_pte(vma, &new_folio->page, 1);
+ pte_t newpte = make_huge_pte(vma, &new_folio->page, true);

__folio_mark_uptodate(new_folio);
hugetlb_add_new_anon_rmap(new_folio, vma, addr);
@@ -6223,8 +6223,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
hugetlb_add_new_anon_rmap(folio, vma, vmf->address);
else
hugetlb_add_file_rmap(folio);
- new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE)
- && (vma->vm_flags & VM_SHARED)));
+ new_pte = make_huge_pte(vma, &folio->page, vma->vm_flags & VM_SHARED);
/*
* If this pte was previously wr-protected, keep it wr-protected even
* if populated.
@@ -6556,7 +6555,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
spinlock_t *ptl;
int ret = -ENOMEM;
struct folio *folio;
- int writable;
bool folio_in_pagecache = false;

if (uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON)) {
@@ -6710,12 +6708,8 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
* For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY
* with wp flag set, don't set pte write bit.
*/
- if (wp_enabled || (is_continue && !vm_shared))
- writable = 0;
- else
- writable = dst_vma->vm_flags & VM_WRITE;
-
- _dst_pte = make_huge_pte(dst_vma, &folio->page, writable);
+ _dst_pte = make_huge_pte(dst_vma, &folio->page,
+ !wp_enabled && !(is_continue && !vm_shared));
/*
* Always mark UFFDIO_COPY page dirty; note that this may not be
* extremely important for hugetlbfs for now since swapping is not
--
2.47.1