Mapping a large PCI resource
From: Constantine Gavrilov
Date: Sun Jan 26 2025 - 08:07:15 EST
Hi:
I am writing software for a PCIe device. The fast IO path uses memory
mapped PCIe bar access in user-space, while the driver software tracks
object creation and destruction, separating applications from each
other and facilitating the cleanups related to process exit.
The PCIe bars are quite large (multiple TiB size) and need to be
mapped to the application virtual memory space. Current standard
approach for implementation of mmap() is to use remap_pfn_range()
kernel API that will map the physical address space using 4 KiB page
chunks on x86_64 architecture. While this works, it takes a
significant time. My tests show that it takes 30 seconds on a modern
CPU for 128 GiB physical range. Needless to say that the memory map
times for a 10 TIB resource would be absolutely prohibitive.
I am looking into a way to force 2 MiB or 1 GiB mapping for a given
resource. I do not think the current kernel API allows to do it
easily, without reimplementing significant chunks of page table
creation code in the code doing the memory map (since some stuff is
not a public API). Additionally, I believe that the kernel code has
some asserts when encountering compound pages for file objects that
are not HUGETLB or DEVDAX, and those would need to be resolved as
well.
Thus, I am interested in having such support at the kernel itself,
where the huge page mapping would be as easily achieved as calling use
remap_pfn_range() in the driver code.
Two questions:
1. Was there a specific thought behind forcing the resource mapping as 4 KiB?
2. Perhaps I missed something and there is a way to force PCIe
resource mapping using huge pages?
Thanks.
--
----------------------------------------
Constantine Gavrilov
Systems Architect
----------------------------------------