Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
From: Yin Tirui
Date: Tue Apr 14 2026 - 23:56:34 EST
Hi David,
On 4/15/26 02:15, David Hildenbrand (Arm) wrote:
> On 4/14/26 17:14, Yin Tirui wrote:
>> On 4/14/26 17:44, David Hildenbrand (Arm) wrote:
>>>
>>> I mean, we need some indication to know also during folio splitting
>>> whether we can just discard the PMD, as we can refault it later, or
>>> whether we really have to install a PTE table.
>>>
>>> What if someone used remap_pfn_range() on some part of the VMA, and
>>> faults on another part?
>>>
>>> Doesn't really work.
>>>
>>> Do we have users of remap_pfn_range() that have ->fault set? If not, we
>>> should probably just disallow this combination.
>>
>> I did a quick tree-wide grep:
>> $ git grep -l "remap_pfn_range" | xargs grep -l "\.fault\s*="
>> arch/powerpc/platforms/book3s/vas-api.c
>> drivers/infiniband/hw/hfi1/file_ops.c
>> drivers/uio/uio.c
>> drivers/vfio/pci/vfio_pci_core.c
>> fs/proc/vmcore.c
>> security/selinux/selinuxfs.c
>>
>> It turns out there are two users of this "hybrid" approach in the kernel:
>> 1. fs/proc/vmcore.c: It pre-maps via remap_pfn_range() but registers
>> mmap_vmcore_fault().
>> 2. arch/powerpc/platforms/book3s/vas-api.c: It pre-maps via
>> remap_pfn_range(), but registers vas_mmap_fault().
>>
>>>
>>> Then we know for sure whether something was installed through
>>> remap_pfn_range() or through a fault handler.
>>>
>>
>> How would you suggest we proceed here?
>
> How about we populate PMDs in remap_pfn_range() only if !fault?
Doing this would at most prevent VMAs with a ->fault() handler from
getting huge mappings, which seems to have little negative impact.
But wait, dynamic huge mappings are actually created through ->huge_fault().
I did a quick grep:
$ git grep -l "remap_pfn_range" | xargs grep -l "\.huge_fault\s*="
drivers/vfio/pci/vfio_pci_core.c
This is a false positive. There is no case in the kernel that mixes
remap_pfn_range() and ->huge_fault() on the same VMA.
What if we use !huge_fault instead, disallowing remap_pfn_range() from
populating PMDs if ->huge_fault() is provided?
Then, when we encounter a huge PMD, we know for sure whether it was
installed through remap_pfn_range() (needs a deposited pgtable) or
->huge_fault() (no deposit needed, can be refaulted).
>
> Then, if we have !fault, we know that the PMD is from remap_pfn_range()
> and has a disposed page table.
>
> Would that work?
>
So for Lorenzo's `has_deposited_pgtable()` helper, we could simply use:
/* Huge PFN map without a huge_fault handler must deposit */
if (vma_test(vma, VMA_PFNMAP_BIT))
return !vma->vm_ops || !vma->vm_ops->huge_fault;
By the way, while auditing this, I noticed that
drivers/gpu/drm/drm_gem_shmem_helper.c calls vmf_insert_pfn_pmd()
directly from its normal ->fault() handler instead of implementing
->huge_fault().
If we adopt the `!huge_fault` check above, this DRM driver would be
wrongly classified as needing a deposit. It seems that DRM driver needs
a minor refactoring to properly use ->huge_fault() to keep the MM
semantics clean.
--
Yin Tirui