Re: [RFC 1/1] mm/pagewalk: don't split device-backed huge pfnmaps

From: David Hildenbrand (Arm)

Date: Wed Mar 11 2026 - 06:46:05 EST

On 3/11/26 11:34, Boone, Max wrote:
>
>
>> On Mar 11, 2026, at 10:59 AM, David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
>>
>> !-------------------------------------------------------------------|
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> |-------------------------------------------------------------------!
>>
>>> The -EINVAL originates from:
>>>
>>> vfio_dma_do_map -> vfio_pin_map_dma -> vfio_pin_pages_remote
>>> -> vaddr_get_pfns -> pin_user_pages_remote (mm/gup.c)
>>>
>>> Possibly that’s also the origin of the concurrent PUD modification that requires
>>> the retry in the walker in this patch.
>>
>> We'd have to find out why we manage to trigger a -EINVAL here. I don't
>> see how anything that this patch does could trigger that. So maybe a
>> problem in user space? (calling it on unsupported VMAs?).
>>
>
> It looks like I was mistaken the EINVAL being from pin_user_pages_remote,
> rather it originates from:
>
> vfio_dma_do_map
> -> vfio_pin_map_dma
> -> vfio_pin_pages_remote
> -> vaddr_get_pfns
> -> follow_fault_pfn
> -> follow_pfnmap_start (mm/memory.c)
>
> In vfio_iommu_type1.c, follow_fault_pfn first checks whether follow_pfnmap_start
> returns an error; if it does, it calls fixup_user_fault to fault the mapping in and then
> retries follow_pfnmap_start to obtain the PFN.
>
> Sounds to me that the walker is likely re-splitting the PUD entry between
> the fixup_user_fault and follow_pfnmap_start calls?

Could be. IIRC, there are also scenarios where handle_mm_fault() just
returns success even though nothing was faulted in (e.g., when it
detects some races).

The code in follow_fault_pfn() should likely be updated to handle more
than one attempt. That's also what GUP does.

Likely, follow_fault_pfn() was never taught about PFNMAP mappings that
can be faulted+zapped (in the past they were always static).

If you turn that into a (possibly) endless loop, does the problem go away?

--
Cheers,

David