Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()

From: Yin Tirui

Date: Wed Apr 15 2026 - 11:33:55 EST


Hi David,

On 4/15/2026 4:13 PM, David Hildenbrand (Arm) wrote:
On 4/15/26 05:50, Yin Tirui wrote:
Hi David,

On 4/15/26 02:15, David Hildenbrand (Arm) wrote:
On 4/14/26 17:14, Yin Tirui wrote:
I did a quick tree-wide grep:
$ git grep -l "remap_pfn_range" | xargs grep -l "\.fault\s*="
arch/powerpc/platforms/book3s/vas-api.c
drivers/infiniband/hw/hfi1/file_ops.c
drivers/uio/uio.c
drivers/vfio/pci/vfio_pci_core.c
fs/proc/vmcore.c
security/selinux/selinuxfs.c

It turns out there are two users of this "hybrid" approach in the kernel:
1. fs/proc/vmcore.c: It pre-maps via remap_pfn_range() but registers
mmap_vmcore_fault().
2. arch/powerpc/platforms/book3s/vas-api.c: It pre-maps via
remap_pfn_range(), but registers vas_mmap_fault().


How would you suggest we proceed here?
How about we populate PMDs in remap_pfn_range() only if !fault?
Doing this would at most prevent VMAs with a ->fault() handler from
getting huge mappings, which seems to have little negative impact.

But wait, dynamic huge mappings are actually created through ->huge_fault().
If my memory serves me right, also fault() can nowadays install PMD
mappings.

For example, shmem only implements ->fault through shmem_fault()

finish_fault() after __do_fault() takes care of that (mapping through a
PMD if possible).


Ah, thanks for correcting my blind spot!


I did a quick grep:
$ git grep -l "remap_pfn_range" | xargs grep -l "\.huge_fault\s*="
drivers/vfio/pci/vfio_pci_core.c

This is a false positive. There is no case in the kernel that mixes
remap_pfn_range() and ->huge_fault() on the same VMA.

What if we use !huge_fault instead, disallowing remap_pfn_range() from
populating PMDs if ->huge_fault() is provided?
I think we should just disallow any PMD mappings if we either have
->fault or ->huge_fault.

I would assume that ->huge_fault implies >fault, but let's rather be
save than sorry.


Agreed. I think I have a clear idea of how to handle this now.


Then, when we encounter a huge PMD, we know for sure whether it was
installed through remap_pfn_range() (needs a deposited pgtable) or
->huge_fault() (no deposit needed, can be refaulted).

Then, if we have !fault, we know that the PMD is from remap_pfn_range()
and has a disposed page table.

Would that work?

So for Lorenzo's `has_deposited_pgtable()` helper, we could simply use:

/* Huge PFN map without a huge_fault handler must deposit */
if (vma_test(vma, VMA_PFNMAP_BIT))
return !vma->vm_ops || !vma->vm_ops->huge_fault;
As mentioned above, also considering vma->vm_ops->fault;


Will do.




By the way, while auditing this, I noticed that
drivers/gpu/drm/drm_gem_shmem_helper.c calls vmf_insert_pfn_pmd()
directly from its normal ->fault() handler instead of implementing
->huge_fault().
If we adopt the `!huge_fault` check above, this DRM driver would be
wrongly classified as needing a deposit. It seems that DRM driver needs
a minor refactoring to properly use ->huge_fault() to keep the MM
semantics clean.
No, it's doing something that's allowed. If we call ->fault and there is
not PTE table, it may insert a PMD.


Thanks for your clarification.


--
Yin Tirui