Re: [RFC PATCH 3/7] vfio/pci: Support mmap() of a DMABUF

From: Matt Evans

Date: Fri Feb 27 2026 - 14:42:58 EST


Hi Jason + Christian,

On 27/02/2026 12:51, Jason Gunthorpe wrote:
> On Fri, Feb 27, 2026 at 11:09:31AM +0100, Christian König wrote:
>
>> When a DMA-buf just represents a linear piece of BAR which is
>> map-able through the VFIO FD anyway then the right approach is to
>> just re-direct the mapping to this VFIO FD.

We think limiting this to one range per DMABUF isn't enough,
i.e. supporting multiple ranges will be a benefit.

Bumping vm_pgoff to then reuse vfio_pci_mmap_ops is a really nice
suggestion for the simplest case, but can't support multiple ranges;
the .fault() needs to be aware of the non-linear DMABUF layout.
> I actually would like to go the other way and have VFIO always have a
> DMABUF under the VMA's it mmaps because that will make it easy to
> finish the type1 emulation which requires finding dmabufs for the
> VMAs.
>
>> It can be that you want additional checks (e.g. if the DMA-buf is
>> revoked) in which case you would need to override the vma->vm_ops,
>> but then just do the access checks and call the vfio_pci_mmap_ops to
>> get the actually page fault handling done.
>
> It isn't that simple, the vm_ops won't have a way to get back to the
> dmabuf from the vma to find the per-fd revoke flag to check it.

Sounds like the suggestion is just to reuse vfio_pci_mmap_*fault(), i.e.
install "interposer" vm_ops for some new 'fault_but_check_revoke()' to
then call down to the existing vfio_pci_mmap_*fault(), after fishing the
DMABUF out of vm_private_data. (Like the proposed
vfio_pci_dma_buf_mmap_huge_fault() does.)

Putting aside the above point of needing a new .fault() able to find a
PFN for >1 range for a mo, how would the test of the revoked flag work
w.r.t. synchronisation and protecting against a racing revoke? It's not
safe to take memory_lock, test revoked, unlock, then hand over to the
existing vfio_pci_mmap_*fault() -- which re-takes the lock. I'm not
quite seeing how we could reuse the existing vfio_pci_mmap_*fault(),
TBH. I did briefly consider refactoring that existing .fault() code,
but that makes both paths uglier.

To summarise, I think we still
- need a new fops->mmap() to link vfio_pci_dma_buf into vm_private_data,
and determine WC attrs
- need a new vm_ops->fault() to test dmabuf->revoked/status and
determine map vs fault with memory_lock held, and to determine the PFN
from >1 DMABUF ranges

>>> + unmap_mapping_range(priv->dmabuf->file->f_mapping,
>>> + 0, priv->size, 1);
>>
>> When you need to use unmap_mapping_range() then you usually share
>> the address space object between the file descriptor exporting the
>> DMA-buf and the DMA-buf fd itself.
>
> Yeah, this becomes problematic. Right now there is a single address
> space per vfio-device and the invalidation is global.
>
> Possibly for this use case you can keep that and do a global unmap and
> rely on fault to restore the mmaps that were not revoked.

Hm, that'd be functional, but we should consider huge BARs with a lot of
PTEs (even huge ones); zapping all BARs might noticeably disturb other
clients. But see my query below please, if we could zap just the
resource being reclaimed that would be preferable.

>> Otherwise functions like vfio_pci_zap_bars() doesn't work correctly
>> any more and that usually creates a huge bunch of problems.

I'd reasoned it was OK for the DMABUF to have its own unique address
space -- even though IIUC that means an unmap_mapping_range() by
vfio_pci_core_device won't affect a DMABUF's mappings -- because
anything that needs to zap a BAR _also_ must already plan to notify
DMABUF importers via vfio_pci_dma_buf_move(). And then,
vfio_pci_dma_buf_move() will zap the mappings.

Are there paths that _don't_ always pair vfio_pci_zap_bars() with a
vfio_pci_dma_buf_move()?

I'm sure I'm missing something, so question phrased as a statement:
The only way that mappings could be missed would be if some path
forgets to ...buf_move() when zapping the BARs, but that'd be a
problem for importers regardless of whether they can now also be
mmap()ed, no?

I don't want to flout convention for the sake of it, and am keen to
learn more, so please gently explain in more detail: Why must we
associate the DMABUFs with the VFIO address space [by sharing the AS
object between the VFIO fd exporting the DMABUF and the DMABUF fd]?


Many thanks,


Matt