Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

From: David Hildenbrand (Arm)

Date: Wed Mar 25 2026 - 04:51:34 EST

On 3/25/26 01:03, Andrew Morton wrote:
> On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@xxxxxxxxx> wrote:
>
>> On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@xxxxxxxxxx wrote:
>>> Just wondering whether making the shift explicit here instead of
>>> introducing another hugetlb helper might be sufficient?
>>>
>>> idx >>= huge_page_order(hstate_vma(vma));
>>
>> That would work for hugetlb VMAs since both (address - vm_start) and
>> vm_pgoff are guaranteed to be huge page aligned. However, David
>> suggested introducing hugetlb_linear_page_index() to provide a cleaner
>> API that mirrors linear_page_index(), so I kept this approach.
>>
>
> Thanks.
>
> Would anyone like to review this cc:stable patch for us?

I would hope the hugetlb+userfaultfd submaintainers could have a
detailed look! Moving them to "To:"

One of the issue why this doesn't get more attention might be posting a
new revision as reply to an old revision, which is an anti-pattern :)

>
>
> From: Jianhui Zhou <jianhuizzzzz@xxxxxxxxx>
> Subject: mm/userfaultfd: fix hugetlb fault mutex hash calculation
> Date: Tue, 10 Mar 2026 19:05:26 +0800
>
> In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
> page index for hugetlb_fault_mutex_hash(). However, linear_page_index()
> returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
> expects the index in huge page units. This mismatch means that different
> addresses within the same huge page can produce different hash values,
> leading to the use of different mutexes for the same huge page. This can
> cause races between faulting threads, which can corrupt the reservation
> map and trigger the BUG_ON in resv_map_release().
>
> Fix this by introducing hugetlb_linear_page_index(), which returns the
> page index in huge page granularity, and using it in place of
> linear_page_index().
>
> Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@xxxxxxxxx
> Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
> Signed-off-by: Jianhui Zhou <jianhuizzzzz@xxxxxxxxx>
> Reported-by: syzbot+f525fd79634858f478e7@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: JonasZhou <JonasZhou@xxxxxxxxxxx>
> Cc: Mike Rapoport <rppt@xxxxxxxxxx>
> Cc: Muchun Song <muchun.song@xxxxxxxxx>
> Cc: Oscar Salvador <osalvador@xxxxxxx>
> Cc: Peter Xu <peterx@xxxxxxxxxx>
> Cc: SeongJae Park <sj@xxxxxxxxxx>
> Cc: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> include/linux/hugetlb.h | 17 +++++++++++++++++
> mm/userfaultfd.c | 2 +-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> --- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/include/linux/hugetlb.h
> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
> return h->order + PAGE_SHIFT;
> }
>
> +/**
> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
> + * page size granularity.
> + * @vma: the hugetlb VMA
> + * @address: the virtual address within the VMA
> + *
> + * Return: the page offset within the mapping in huge page units.
> + */
> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
> + unsigned long address)
> +{
> + struct hstate *h = hstate_vma(vma);
> +
> + return ((address - vma->vm_start) >> huge_page_shift(h)) +
> + (vma->vm_pgoff >> huge_page_order(h));
> +}
> +
> static inline bool order_is_gigantic(unsigned int order)
> {
> return order > MAX_PAGE_ORDER;
> --- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/mm/userfaultfd.c
> @@ -573,7 +573,7 @@ retry:
> * in the case of shared pmds. fault mutex prevents
> * races with other faulting threads.
> */
> - idx = linear_page_index(dst_vma, dst_addr);
> + idx = hugetlb_linear_page_index(dst_vma, dst_addr);
> mapping = dst_vma->vm_file->f_mapping;
> hash = hugetlb_fault_mutex_hash(mapping, idx);
> mutex_lock(&hugetlb_fault_mutex_table[hash]);
> _
>

Let's take a look at other hugetlb_fault_mutex_hash() users:

* remove_inode_hugepages: uses folio->index >> huge_page_order(h)
-> hugetlb granularity
* hugetlbfs_fallocate(): start/index is in hugetlb granularity
-> hugetlb granularity
* memfd_alloc_folio(): idx >>= huge_page_order(h);
-> hugetlb granularity
* hugetlb_wp(): uses vma_hugecache_offset()
-> hugetlb granularity
* hugetlb_handle_userfault(): uses vmf->pgoff, which hugetlb_fault()
sets to vma_hugecache_offset()
-> hugetlb granularity
* hugetlb_no_page(): similarly uses vmf->pgoff
-> hugetlb granularity
* hugetlb_fault(): similarly uses vmf->pgoff
-> hugetlb granularity

So this change here looks good to me

Reviewed-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

But it raises the question:

(1) should be convert all that to just operate on the ordinary index,
such that we don't even need hugetlb_linear_page_index()? That would be
an addon patch.

(2) Alternatively, could we replace all users of vma_hugecache_offset()
by the much cleaner hugetlb_linear_page_index() ?

In general, I think we should look into having idx/vmf->pgoff being
consistent with the remainder of MM, converting all code in hugetlb to
do that.

Any takers?

--
Cheers,

David