Re: [RFC PATCH 0/5] vfio/pci: Support ZONE_DEVICE-backed P2P Registration

From: Jason Gunthorpe

Date: Thu Jun 11 2026 - 18:15:18 EST


On Thu, Jun 11, 2026 at 02:40:17PM +0000, Pranjal Shrivastava wrote:
> On Wed, Jun 10, 2026 at 01:28:48PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 10, 2026 at 03:18:48PM +0000, Pranjal Shrivastava wrote:
> >
> > > Users utilize the standard sysfs p2pmem/allocate interface for managing
> > > memory slices once a BAR is registered.
> >
> > I'm shocked someone wants to use API, what are you expecting to do
> > with it??
>
> Our primary use-case is PCIe BAR (DDR / HBM) -> NFS via P2PDMA while the
> PCIe device is managed by a user-space driver based on vfio-pci. While
> kernel drivers (e.g.drm) can register BARs with ZONE_DEVICE natively to
> enable this, VFIO currently lacks an equivalent mechanism.

I mean the weird sysfs mmap API. It is only useful if the device is
basically pure memory with no functionality. You can't even learn what
MMIO offset the returned allocation gives so it is almost completely
useless.

nvme could use it because CMB is pure memory and you reference it by
its MMIO address, but that doesn't apply to VFIO..

> > > An alternative implementation has been explored which integrates with the
> > > ongoing VFIO DMABUF-mmap refactor [1]. In that approach, rather than
> > > registering a BAR as a system-wide P2P provider, VFIO optionally
> > > allocates ZONE_DEVICE pages only for specifically exported DMABUFs via a
> > > new VFIO_DMA_BUF_FLAG_ALLOC_STRUCT_PAGES flag.
> >
> > That's probably more sensible but you can't have a DMABUF mmap
> > actually install non-special memory. The native vfio mmap still can,
> > but not mmap on the dmabuf fd. That's still workable, just keep in
> > mind.
>
> Ack. I guess, we could have a separate mmap path in case of BARs that are
> struct page backed which doesn't go through the dmabuf exporter.

The dmabuf export is perfectly fine, you just have to think very
carefully about the mmap path.

I suppose if you build the proper revocation fence for zone device
pages as part of the vfio implementation it would be OK for dmabuf
mmap to expose them as well since it would have the right lifecycle
model.

That's the tricky thing with zone_device, you have to be careful to
wait for all the page references to be put back at all the right
times.

Come to think of it, since the sysfs API cannot do that in the way
VFIO wants I actually think you can't use it..

Jason