Re: [RFC v2 PATCH 00/10] vfio/pci: Add mmap() for DMABUFs

From: Matt Evans

Date: Fri Mar 13 2026 - 09:49:09 EST


Hi Christian,

On 13/03/2026 09:21, Christian König wrote:
> On 3/12/26 19:45, Matt Evans wrote:
>> Hi all,
>>
>>
>> There were various suggestions in the September 2025 thread "[TECH
>> TOPIC] vfio, iommufd: Enabling user space drivers to vend more
>> granular access to client processes" [0], and LPC discussions, around
>> improving the situation for multi-process userspace driver designs.
>> This RFC series implements some of these ideas.
>>
>> (Thanks for feedback on v1! Revised series, with changes noted
>> inline.)
>>
>> Background: Multi-process USDs
>> ==============================
>>
>> The userspace driver scenario discussed in that thread involves a
>> primary process driving a PCIe function through VFIO/iommufd, which
>> manages the function-wide ownership/lifecycle. The function is
>> designed to provide multiple distinct programming interfaces (for
>> example, several independent MMIO register frames in one function),
>> and the primary process delegates control of these interfaces to
>> multiple independent client processes (which do the actual work).
>> This scenario clearly relies on a HW design that provides appropriate
>> isolation between the programming interfaces.
>>
>> The two key needs are:
>>
>> 1. Mechanisms to safely delegate a subset of the device MMIO
>> resources to a client process without over-sharing wider access
>> (or influence over whole-device activities, such as reset).
>>
>> 2. Mechanisms to allow a client process to do its own iommufd
>> management w.r.t. its address space, in a way that's isolated
>> from DMA relating to other clients.
>>
>>
>> mmap() of VFIO DMABUFs
>> ======================
>>
>> This RFC addresses #1 in "vfio/pci: Support mmap() of a VFIO DMABUF",
>> implementing the proposals in [0] to add mmap() support to the
>> existing VFIO DMABUF exporter.
>>
>> This enables a userspace driver to define DMABUF ranges corresponding
>> to sub-ranges of a BAR, and grant a given client (via a shared fd)
>> the capability to access (only) those sub-ranges. The VFIO device fds
>> would be kept private to the primary process. All the client can do
>> with that fd is map (or iomap via iommufd) that specific subset of
>> resources, and the impact of bugs/malice is contained.
>>
>> (We'll follow up on #2 separately, as a related-but-distinct problem.
>> PASIDs are one way to achieve per-client isolation of DMA; another
>> could be sharing of a single IOVA space via 'constrained' iommufds.)
>>
>>
>> New in v2: To achieve this, the existing VFIO BAR mmap() path is
>> converted to use DMABUFs behind the scenes, in "vfio/pci: Convert BAR
>> mmap() to use a DMABUF" plus new helper functions, as Jason/Christian
>> suggested in the v1 discussion [3].
>>
>> This means:
>>
>> - Both regular and new DMABUF BAR mappings share the same vm_ops,
>> i.e. mmap()ing DMABUFs is a smaller change on top of the existing
>> mmap().
>>
>> - The zapping of mappings occurs via vfio_pci_dma_buf_move(), and the
>> vfio_pci_zap_bars() originally paired with the _move()s can go
>> away. Each DMABUF has a unique address_space.
>>
>> - It's a step towards future iommufd VFIO Type1 emulation
>> implementing P2P, since iommufd can now get a DMABUF from a VA that
>> it's mapping for IO; the VMAs' vm_file is that of the backing
>> DMABUF.
>>
>>
>> Revocation/reclaim
>> ==================
>>
>> Mapping a BAR subset is useful, but the lifetime of access granted to
>> a client needs to be managed well. For example, a protocol between
>> the primary process and the client can indicate when the client is
>> done, and when it's safe to reuse the resources elsewhere, but cleanup
>> can't practically be cooperative.
>>
>> For robustness, we enable the driver to make the resources
>> guaranteed-inaccessible when it chooses, so that it can re-assign them
>> to other uses in future.
>>
>> "vfio/pci: Permanently revoke a DMABUF on request" adds a new VFIO
>> device fd ioctl, VFIO_DEVICE_PCI_DMABUF_REVOKE. This takes a DMABUF
>> fd parameter previously exported (from that device!) and permanently
>> revokes the DMABUF. This notifies/detaches importers, zaps PTEs for
>> any mappings, and guarantees no future attachment/import/map/access is
>> possible by any means.
>>
>> A primary driver process would use this operation when the client's
>> tenure ends to reclaim "loaned-out" MMIO interfaces, at which point
>> the interfaces could be safely re-used.
>>
>> New in v2: ioctl() on VFIO driver fd, rather than DMABUF fd. A DMABUF
>> is revoked using code common to vfio_pci_dma_buf_move(), selectively
>> zapping mappings (after waiting for completion on the
>> dma_buf_invalidate_mappings() request).
>>
>>
>> BAR mapping access attributes
>> =============================
>>
>> Inspired by Alex [Mastro] and Jason's comments in [0] and Mahmoud's
>> work in [1] with the goal of controlling CPU access attributes for
>> VFIO BAR mappings (e.g. WC), we can decorate DMABUFs with access
>> attributes that are then used by a mapping's PTEs.
>>
>> I've proposed reserving a field in struct
>> vfio_device_feature_dma_buf's flags to specify an attribute for its
>> ranges. Although that keeps the (UAPI) struct unchanged, it means all
>> ranges in a DMABUF share the same attribute. I feel a single
>> attribute-to-mmap() relation is logical/reasonable. An application
>> can also create multiple DMABUFs to describe any BAR layout and mix of
>> attributes.
>>
>>
>> Tests
>> =====
>>
>> (Still sharing the [RFC ONLY] userspace test/demo program for context,
>> not for merge.)
>>
>> It illustrates & tests various map/revoke cases, but doesn't use the
>> existing VFIO selftests and relies on a (tweaked) QEMU EDU function.
>> I'm (still) working on integrating the scenarios into the existing
>> VFIO selftests.
>>
>> This code has been tested in mapping DMABUFs of single/multiple
>> ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
>> 0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
>> to work correctly. I've lightly tested WC mappings also (by observing
>> resulting PTEs as having the correct attributes...).
>>
>>
>> Fin
>> ===
>>
>> v2 is based on next-20260310 (to build on Leon's recent series
>> "vfio: Wait for dma-buf invalidation to complete" [2]).
>>
>>
>> Please share your thoughts! I'd like to de-RFC if we feel this
>> approach is now fair.
>
> I only skimmed over it, but at least of hand I couldn't find anything fundamentally wrong.

Thank you!

> The locking order seems to change in patch #6. In general I strongly recommend to enable lockdep while testing anyway but explicitly when I see such changes.

I'll definitely +1 on testing with lockdep.

Note that patch #6 doesn't [intend to] change the locking; the naming of
the existing vfio_pci_zap_and_down_write_memory_lock() is potentially
confusing because _really_ it's
vfio_pci_down_write_memory_lock_and_zap(). Patch #6 is replacing that
with _just_ the existing down_write(&memory_lock) part.

(FWIW, lockdep's happy when running the test scenarios on this series.)

> Additional to that it might also be a good idea to have a lockdep initcall function which defines the locking order in the way all the VFIO code should follow.
>
> See function dma_resv_lockdep() for an example on how to do that. Especially with mmap support and all the locks involved with that it has proven to be a good practice to have something like that.

That's a good suggestion; I'll investigate, and thanks for the pointer.
I spent time stepping through the locking particularly in the revoke
path, and automation here would be pretty useful if possible.


Thanks and regards,


Matt


>
> Regards,
> Christian.
>
>>
>>
>> Many thanks,
>>
>>
>> Matt
>>
>>
>>
>> References:
>>
>> [0]: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@xxxxxx/
>> [1]: https://lore.kernel.org/all/20250804104012.87915-1-mngyadam@xxxxxxxxx/
>> [2]: https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566ad@houat/T/#m310cd07011e3a1461b6fda45e3f9b886ba76571a
>> [3]: https://lore.kernel.org/all/20260226202211.929005-1-mattev@xxxxxxxx/
>>
>> --------------------------------------------------------------------------------
>> Changelog:
>>
>> v2: Respin based on the feedback/suggestions:
>>
>> - Transform the existing VFIO BAR mmap path to also use DMABUFs behind
>> the scenes, and then simply share that code for explicitly-mapped
>> DMABUFs.
>>
>> - Refactors the export itself out of vfio_pci_core_feature_dma_buf,
>> and shared by a new vfio_pci_core_mmap_prep_dmabuf helper used by
>> the regular VFIO mmap to create a DMABUF.
>>
>> - Revoke buffers using a VFIO device fd ioctl
>>
>> v1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@xxxxxxxx/
>>
>>
>> Matt Evans (10):
>> vfio/pci: Set up VFIO barmap before creating a DMABUF
>> vfio/pci: Clean up DMABUFs before disabling function
>> vfio/pci: Add helper to look up PFNs for DMABUFs
>> vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
>> vfio/pci: Convert BAR mmap() to use a DMABUF
>> vfio/pci: Remove vfio_pci_zap_bars()
>> vfio/pci: Support mmap() of a VFIO DMABUF
>> vfio/pci: Permanently revoke a DMABUF on request
>> vfio/pci: Add mmap() attributes to DMABUF feature
>> [RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test
>>
>> drivers/vfio/pci/Kconfig | 3 +-
>> drivers/vfio/pci/Makefile | 3 +-
>> drivers/vfio/pci/vfio_pci_config.c | 18 +-
>> drivers/vfio/pci/vfio_pci_core.c | 123 +--
>> drivers/vfio/pci/vfio_pci_dmabuf.c | 425 +++++++--
>> drivers/vfio/pci/vfio_pci_priv.h | 46 +-
>> include/uapi/linux/vfio.h | 42 +-
>> tools/testing/selftests/vfio/Makefile | 1 +
>> .../vfio/standalone/vfio_dmabuf_mmap_test.c | 837 ++++++++++++++++++
>> 9 files changed, 1339 insertions(+), 159 deletions(-)
>> create mode 100644 tools/testing/selftests/vfio/standalone/vfio_dmabuf_mmap_test.c
>>
>