Re: [PATCH v2 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling

From: Mike Kravetz
Date: Mon Oct 03 2022 - 22:33:19 EST


On 10/03/22 20:37, Peter Xu wrote:
> After the recent rework patchset of hugetlb locking on pmd sharing,
> kselftest for userfaultfd sometimes fails on hugetlb private tests with
> unexpected write fault checks.
>
> It turns out there's nothing wrong within the locking series regarding this
> matter, but it could have changed the timing of threads so it can trigger
> an old bug.
>
> The real bug is when we call hugetlb_no_page() we're not with the pgtable
> lock. It means we're reading the pte values lockless. It's perfectly fine
> in most cases because before we do normal page allocations we'll take the
> lock and check pte_same() again. However before that, there are actually
> two paths on userfaultfd missing/minor handling that may directly move on
> with the fault process without checking the pte values.
>
> It means for these two paths we may be generating an uffd message based on
> an unstable pte, while an unstable pte can legally be anything as long as
> the modifier holds the pgtable lock.
>
> One example, which is also what happened in the failing kselftest and
> caused the test failure, is that for private mappings wr-protection changes
> can happen on one page. While hugetlb_change_protection() generally
> requires pte being cleared before being changed, then there can be a race
> condition like:
>
> thread 1 thread 2
> -------- --------
>
> UFFDIO_WRITEPROTECT hugetlb_fault
> hugetlb_change_protection
> pgtable_lock()
> huge_ptep_modify_prot_start
> pte==NULL
> hugetlb_no_page
> generate uffd missing event
> even if page existed!!
> huge_ptep_modify_prot_commit
> pgtable_unlock()
>
> Fix this by recheck the pte after pgtable lock for both userfaultfd missing
> & minor fault paths.
>
> This bug should have been around starting from uffd hugetlb introduced, so
> attaching a Fixes to the commit. Also attach another Fixes to the minor
> support commit for easier tracking.
>
> Note that userfaultfd is actually fine with false positives (e.g. caused by
> pte changed), but not wrong logical events (e.g. caused by reading a pte
> during changing). The latter can confuse the userspace, so the strictness
> is very much preferred. E.g., MISSING event should never happen on the
> page after UFFDIO_COPY has correctly installed the page and returned.
>
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
> Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
> Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode")
> Co-developed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 49 insertions(+), 6 deletions(-)

Thanks.

Do note that this will not apply on top of "mm: hugetlb: fix UAF in
hugetlb_handle_userfault" which is already in Andrew's tree. However,
required changes should be simple.

Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
--
Mike Kravetz

>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9679fe519b90..fa3fcdb0c4b8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5521,6 +5521,23 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma,
> return ret;
> }
>
> +/*
> + * Recheck pte with pgtable lock. Returns true if pte didn't change, or
> + * false if pte changed or is changing.
> + */
> +static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm,
> + pte_t *ptep, pte_t old_pte)
> +{
> + spinlock_t *ptl;
> + bool same;
> +
> + ptl = huge_pte_lock(h, mm, ptep);
> + same = pte_same(huge_ptep_get(ptep), old_pte);
> + spin_unlock(ptl);
> +
> + return same;
> +}
> +
> static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> struct vm_area_struct *vma,
> struct address_space *mapping, pgoff_t idx,
> @@ -5562,9 +5579,30 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> goto out;
> /* Check for page in userfault range */
> if (userfaultfd_missing(vma)) {
> - ret = hugetlb_handle_userfault(vma, mapping, idx,
> - flags, haddr, address,
> - VM_UFFD_MISSING);
> + /*
> + * Since hugetlb_no_page() was examining pte
> + * without pgtable lock, we need to re-test under
> + * lock because the pte may not be stable and could
> + * have changed from under us. Try to detect
> + * either changed or during-changing ptes and retry
> + * properly when needed.
> + *
> + * Note that userfaultfd is actually fine with
> + * false positives (e.g. caused by pte changed),
> + * but not wrong logical events (e.g. caused by
> + * reading a pte during changing). The latter can
> + * confuse the userspace, so the strictness is very
> + * much preferred. E.g., MISSING event should
> + * never happen on the page after UFFDIO_COPY has
> + * correctly installed the page and returned.
> + */
> + if (hugetlb_pte_stable(h, mm, ptep, old_pte))
> + ret = hugetlb_handle_userfault(
> + vma, mapping, idx, flags, haddr,
> + address, VM_UFFD_MISSING);
> + else
> + /* Retry the fault */
> + ret = 0;
> goto out;
> }
>
> @@ -5634,9 +5672,14 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> if (userfaultfd_minor(vma)) {
> unlock_page(page);
> put_page(page);
> - ret = hugetlb_handle_userfault(vma, mapping, idx,
> - flags, haddr, address,
> - VM_UFFD_MINOR);
> + /* See comment in userfaultfd_missing() block above */
> + if (hugetlb_pte_stable(h, mm, ptep, old_pte))
> + ret = hugetlb_handle_userfault(
> + vma, mapping, idx, flags, haddr,
> + address, VM_UFFD_MINOR);
> + else
> + /* Retry the fault */
> + ret = 0;
> goto out;
> }
> }
> --
> 2.37.3
>