Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()

From: David Hildenbrand (Arm)

Date: Wed Apr 15 2026 - 04:13:51 EST


On 4/15/26 05:50, Yin Tirui wrote:
> Hi David,
>
> On 4/15/26 02:15, David Hildenbrand (Arm) wrote:
>> On 4/14/26 17:14, Yin Tirui wrote:
>>>
>>> I did a quick tree-wide grep:
>>> $ git grep -l "remap_pfn_range" | xargs grep -l "\.fault\s*="
>>> arch/powerpc/platforms/book3s/vas-api.c
>>> drivers/infiniband/hw/hfi1/file_ops.c
>>> drivers/uio/uio.c
>>> drivers/vfio/pci/vfio_pci_core.c
>>> fs/proc/vmcore.c
>>> security/selinux/selinuxfs.c
>>>
>>> It turns out there are two users of this "hybrid" approach in the kernel:
>>> 1. fs/proc/vmcore.c: It pre-maps via remap_pfn_range() but registers
>>> mmap_vmcore_fault().
>>> 2. arch/powerpc/platforms/book3s/vas-api.c: It pre-maps via
>>> remap_pfn_range(), but registers vas_mmap_fault().
>>>
>>>
>>> How would you suggest we proceed here?
>>
>> How about we populate PMDs in remap_pfn_range() only if !fault?
>
> Doing this would at most prevent VMAs with a ->fault() handler from
> getting huge mappings, which seems to have little negative impact.
>
> But wait, dynamic huge mappings are actually created through ->huge_fault().

If my memory serves me right, also fault() can nowadays install PMD
mappings.

For example, shmem only implements ->fault through shmem_fault()

finish_fault() after __do_fault() takes care of that (mapping through a
PMD if possible).

>
> I did a quick grep:
> $ git grep -l "remap_pfn_range" | xargs grep -l "\.huge_fault\s*="
> drivers/vfio/pci/vfio_pci_core.c
>
> This is a false positive. There is no case in the kernel that mixes
> remap_pfn_range() and ->huge_fault() on the same VMA.
>
> What if we use !huge_fault instead, disallowing remap_pfn_range() from
> populating PMDs if ->huge_fault() is provided?

I think we should just disallow any PMD mappings if we either have
->fault or ->huge_fault.

I would assume that ->huge_fault implies >fault, but let's rather be
save than sorry.

>
> Then, when we encounter a huge PMD, we know for sure whether it was
> installed through remap_pfn_range() (needs a deposited pgtable) or
> ->huge_fault() (no deposit needed, can be refaulted).
>
>>
>> Then, if we have !fault, we know that the PMD is from remap_pfn_range()
>> and has a disposed page table.
>>
>> Would that work?
>>
>
> So for Lorenzo's `has_deposited_pgtable()` helper, we could simply use:
>
> /* Huge PFN map without a huge_fault handler must deposit */
> if (vma_test(vma, VMA_PFNMAP_BIT))
> return !vma->vm_ops || !vma->vm_ops->huge_fault;

As mentioned above, also considering vma->vm_ops->fault;

>
>
> By the way, while auditing this, I noticed that
> drivers/gpu/drm/drm_gem_shmem_helper.c calls vmf_insert_pfn_pmd()
> directly from its normal ->fault() handler instead of implementing
> ->huge_fault().
> If we adopt the `!huge_fault` check above, this DRM driver would be
> wrongly classified as needing a deposit. It seems that DRM driver needs
> a minor refactoring to properly use ->huge_fault() to keep the MM
> semantics clean.
No, it's doing something that's allowed. If we call ->fault and there is
not PTE table, it may insert a PMD.

--
Cheers,

David