Re: [PATCH] vfio: Request THP-aligned mmap for device fds
From: Alex Williamson
Date: Tue Jun 16 2026 - 18:32:07 EST
On Tue, 16 Jun 2026 14:01:29 -0400
Anthony Pighin <anthony.pighin@xxxxxxxxx> wrote:
> VFIO PCI devices support PMD-sized page table entries for BAR mappings
> via their huge_fault handler (vfio_pci_mmap_huge_fault). However, the
> VFIO device file_operations never provided a get_unmapped_area callback
> to request PMD-aligned virtual address placement from the mmap address
> allocator.
>
> Before commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> get_unmapped_area"), this was masked by a bug introduced in commit
> ed48e87c7df3 ("thp: add thp_get_unmapped_area_vmflags()") which
> inadvertently applied THP alignment to all file-backed mappings,
> regardless of whether they provided a get_unmapped_area callback.
>
> When commit 34d7cf637c43 ("mm: don't try THP alignment for FS without
> get_unmapped_area") correctly restricted THP alignment to anonymous
> mappings and files that explicitly opt in via get_unmapped_area, VFIO BAR
> mappings lost their PMD-aligned placement. Since the huge_fault handler
> requires both the VMA start address and the physical PFN to be
> PMD-aligned, unaligned VMAs force a fallback to 4KB page faults.
>
> For example, a 2GiB BAR results in 524,288 individual page faults
> instead of 1,024 PMD-sized faults, increasing the VFIO_IOMMU_MAP_DMA
> pinning time by orders of magnitude -- a regression directly visible to
> KVM guests during PCI device initialization.
>
> Fix this by providing a get_unmapped_area callback in vfio_device_fops,
> following the same pattern used by ext4, xfs, btrfs, fuse, and other
> subsystems that benefit from THP-aligned placement.
The trouble is that PMD alignment isn't right either, your 1024 PMD
faults on a 2GiB BAR would be 2 faults on x86_64 with PUD mappings.
QEMU has forced the alignment to make it optimal for some time[1], so
there are userspace VMM options. Seems like you were previously
getting lucky.
Peter Xu was working on a more comprehensive solution[2] late last
year, but it seems there was an objection to the
file_operations.get_mapping_order() proposal before Plumbers and the
thread hasn't rekindled.
Gentle bump to Peter and Willy that maybe we could resurrect that
effort. Thanks,
Alex
[1]https://gitlab.com/qemu-project/qemu/-/commit/00b519c0bca0
[2]https://lore.kernel.org/all/20251204151003.171039-1-peterx@xxxxxxxxxx/