Re: [Qemu-devel] [PATCH v2 2/2] virtio-pmem: Add virtio pmem driver

From: Dan Williams
Date: Wed Oct 17 2018 - 15:37:07 EST


On Wed, Oct 17, 2018 at 12:11 PM Pankaj Gupta <pagupta@xxxxxxxxxx> wrote:
>
>
>
> > On Fri, Oct 12, 2018 at 10:01 PM Pankaj Gupta <pagupta@xxxxxxxxxx> wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> >
> > Before we can move forward with this driver we need additional
> > filesystem enabling to detect when the backing device is fronting DAX
> > pmem or a paravirtualized page cache through virtio-pmem. Any
> > interface that requires fsync() and a round trip to the hypervisor to
> > flush host page cache is not DAX.
>
> I saw your proposal[1] for new mmap flag MAP_DIRECT. IIUIC mapping should fail for
> MAP_DIRECT if it requires explicit flush or buffer indirection. So, if we disable
> MAP_SYNC flag for virtio-pmem this should fail MAP_DIRECT as well? Otherwise
> without MAP_DIRECT, virtio-pmem should be defaulted to VIRTIO flush mechanism.

Right, although I wouldn't worry about MAP_DIRECT in the short term
since we're still discussing what the upstream interface. Regardless
of whether MAP_DIRECT is specified or not the virtio-flush mechanism
would always be used for virtio-pmem. I.e. there is no possibility to
get full DAX operation with virtio-pmem, only the page-cache bypass
sub-set.

Taking a look at where we could inject this check for filesystems it's
a bit awkward to do it in xfs_file_mmap() for example because we do
not have the backing device for the extents of the inode. So at a
minimum you would need to investigate calling xfs_inode_supports_dax()
from that path and teaching it about a new dax_device flag. I'm
thinking the dax_device flag should be called DAXDEV_BUFFERED to
indicate the presence of software buffering on a device that otherwise
supports bypassing the local page cache.