Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
From: Yin Tirui
Date: Wed Apr 15 2026 - 11:33:55 EST
Hi David,
On 4/15/2026 4:13 PM, David Hildenbrand (Arm) wrote:
On 4/15/26 05:50, Yin Tirui wrote:
Hi David,If my memory serves me right, also fault() can nowadays install PMD
On 4/15/26 02:15, David Hildenbrand (Arm) wrote:
On 4/14/26 17:14, Yin Tirui wrote:Doing this would at most prevent VMAs with a ->fault() handler from
I did a quick tree-wide grep:How about we populate PMDs in remap_pfn_range() only if !fault?
$ git grep -l "remap_pfn_range" | xargs grep -l "\.fault\s*="
arch/powerpc/platforms/book3s/vas-api.c
drivers/infiniband/hw/hfi1/file_ops.c
drivers/uio/uio.c
drivers/vfio/pci/vfio_pci_core.c
fs/proc/vmcore.c
security/selinux/selinuxfs.c
It turns out there are two users of this "hybrid" approach in the kernel:
1. fs/proc/vmcore.c: It pre-maps via remap_pfn_range() but registers
mmap_vmcore_fault().
2. arch/powerpc/platforms/book3s/vas-api.c: It pre-maps via
remap_pfn_range(), but registers vas_mmap_fault().
How would you suggest we proceed here?
getting huge mappings, which seems to have little negative impact.
But wait, dynamic huge mappings are actually created through ->huge_fault().
mappings.
For example, shmem only implements ->fault through shmem_fault()
finish_fault() after __do_fault() takes care of that (mapping through a
PMD if possible).
Ah, thanks for correcting my blind spot!
I did a quick grep:I think we should just disallow any PMD mappings if we either have
$ git grep -l "remap_pfn_range" | xargs grep -l "\.huge_fault\s*="
drivers/vfio/pci/vfio_pci_core.c
This is a false positive. There is no case in the kernel that mixes
remap_pfn_range() and ->huge_fault() on the same VMA.
What if we use !huge_fault instead, disallowing remap_pfn_range() from
populating PMDs if ->huge_fault() is provided?
->fault or ->huge_fault.
I would assume that ->huge_fault implies >fault, but let's rather be
save than sorry.
Agreed. I think I have a clear idea of how to handle this now.
Then, when we encounter a huge PMD, we know for sure whether it wasAs mentioned above, also considering vma->vm_ops->fault;
installed through remap_pfn_range() (needs a deposited pgtable) or
->huge_fault() (no deposit needed, can be refaulted).
Then, if we have !fault, we know that the PMD is from remap_pfn_range()So for Lorenzo's `has_deposited_pgtable()` helper, we could simply use:
and has a disposed page table.
Would that work?
/* Huge PFN map without a huge_fault handler must deposit */
if (vma_test(vma, VMA_PFNMAP_BIT))
return !vma->vm_ops || !vma->vm_ops->huge_fault;
Will do.
No, it's doing something that's allowed. If we call ->fault and there is
By the way, while auditing this, I noticed that
drivers/gpu/drm/drm_gem_shmem_helper.c calls vmf_insert_pfn_pmd()
directly from its normal ->fault() handler instead of implementing
->huge_fault().
If we adopt the `!huge_fault` check above, this DRM driver would be
wrongly classified as needing a deposit. It seems that DRM driver needs
a minor refactoring to properly use ->huge_fault() to keep the MM
semantics clean.
not PTE table, it may insert a PMD.
Thanks for your clarification.
--
Yin Tirui