Re: [PATCH v8 0/6] virtio pmem driver
From: Michael S. Tsirkin
Date: Fri May 10 2019 - 12:31:30 EST
On Fri, May 10, 2019 at 09:21:56PM +0530, Pankaj Gupta wrote:
> Hi Michael & Dan,
> Please review/ack the patch series from LIBNVDIMM & VIRTIO side.
Hope to do this early next week.
> We have ack on ext4, xfs patches(4, 5 & 6) patch 2. Still need
> your ack on nvdimm patches(1 & 3) & virtio patch 2.
> Changes done from v7 are only in patch(2 & 3) and not
> affecting existing reviews. Request to please review.
> This patch series has implementation for "virtio pmem".
> "virtio pmem" is fake persistent memory(nvdimm) in guest
> which allows to bypass the guest page cache. This also
> implements a VIRTIO based asynchronous flush mechanism.
> Sharing guest kernel driver in this patchset with the
> changes suggested in v4. Tested with Qemu side device
> emulation  for virtio-pmem. Documented the impact of
> possible page cache side channel attacks with suggested
> Details of project idea for 'virtio pmem' flushing interface
> is shared  & .
> Implementation is divided into two parts:
> New virtio pmem guest driver and qemu code changes for new
> virtio pmem paravirtualized device.
> 1. Guest virtio-pmem kernel driver
> - Reads persistent memory range from paravirt device and
> registers with 'nvdimm_bus'.
> - 'nvdimm/pmem' driver uses this information to allocate
> persistent memory region and setup filesystem operations
> to the allocated memory.
> - virtio pmem driver implements asynchronous flushing
> interface to flush from guest to host.
> 2. Qemu virtio-pmem device
> - Creates virtio pmem device and exposes a memory range to
> KVM guest.
> - At host side this is file backed memory which acts as
> persistent memory.
> - Qemu side flush uses aio thread pool API's and virtio
> for asynchronous guest multi request handling.
> David Hildenbrand CCed also posted a modified version of
> qemu virtio-pmem code based on updated Qemu memory device API.
> Virtio-pmem security implications and countermeasures:
> In previous posting of kernel driver, there was discussion 
> on possible implications of page cache side channel attacks with
> virtio pmem. After thorough analysis of details of known side
> channel attacks, below are the suggestions:
> - Depends entirely on how host backing image file is mapped
> into guest address space.
> - virtio-pmem device emulation, by default shared mapping is used
> to map host backing file. It is recommended to use separate
> backing file at host side for every guest. This will prevent
> any possibility of executing common code from multiple guests
> and any chance of inferring guest local data based based on
> execution time.
> - If backing file is required to be shared among multiple guests
> it is recommended to don't support host page cache eviction
> commands from the guest driver. This will avoid any possibility
> of inferring guest local data or host data from another guest.
> - Proposed device specification  for virtio-pmem device with
> details of possible security implications and suggested
> countermeasures for device emulation.
> Virtio-pmem errors handling:
> Checked behaviour of virtio-pmem for below types of errors
> Need suggestions on expected behaviour for handling these errors?
> - Hardware Errors: Uncorrectable recoverable Errors:
> a] virtio-pmem:
> - As per current logic if error page belongs to Qemu process,
> host MCE handler isolates(hwpoison) that page and send SIGBUS.
> Qemu SIGBUS handler injects exception to KVM guest.
> - KVM guest then isolates the page and send SIGBUS to guest
> userspace process which has mapped the page.
> b] Existing implementation for ACPI pmem driver:
> - Handles such errors with MCE notifier and creates a list
> of bad blocks. Read/direct access DAX operation return EIO
> if accessed memory page fall in bad block list.
> - It also starts backgound scrubbing.
> - Similar functionality can be reused in virtio-pmem with MCE
> notifier but without scrubbing(no ACPI/ARS)? Need inputs to
> confirm if this behaviour is ok or needs any change?
> Changes from PATCH v7: 
> - Corrected pending request queue logic (patch 2) - Jakub StaroÅ
> - Used unsigned long flags for passing DAXDEV_F_SYNC (patch 3) - Dan
> - Fixed typo => vma 'flag' to 'vm_flag' (patch 4)
> - Added rob in patch 6 & patch 2
> Changes from PATCH v6: 
> - Corrected comment format in patch 5 & patch 6. [Dave]
> - Changed variable declaration indentation in patch 6 [Darrick]
> - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5
> Changes from PATCH v5: 
> Changes suggested in by - [Cornelia, Yuval]
> - Remove assignment chaining in virtio driver
> - Better error message and remove not required free
> - Check nd_region before use
> Changes suggested by - [Jan Kara]
> - dax_synchronous() for !CONFIG_DAX
> - Correct 'daxdev_mapping_supported' comment and non-dax implementation
> Changes suggested by - [Dan Williams]
> - Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
> - Gate nvdimm_flush instead of additional async parameter
> - Move block chaining logic to flush callback than common nvdimm_flush
> - Use NULL flush callback for generic flush for better readability [Dan, Jan]
> - Use virtio device id 27 from 25(already used) - [MST]
> Changes from PATCH v4:
> - Factor out MAP_SYNC supported functionality to a common helper
> [Dave, Darrick, Jan]
> - Comment, indentation and virtqueue_kick failure handle - Yuval Shaia
> Changes from PATCH v3:
> - Use generic dax_synchronous() helper to check for DAXDEV_SYNC
> flag - [Dan, Darrick, Jan]
> - Add 'is_nvdimm_async' function
> - Document page cache side channel attacks implications &
> countermeasures - [Dave Chinner, Michael]
> Changes from PATCH v2:
> - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
> - Use name 'virtio pmem' in place of 'fake dax'
> Changes from PATCH v1:
> - 0-day build test for build dependency on libnvdimm
> Changes suggested by - [Dan Williams]
> - Split the driver into two parts virtio & pmem
> - Move queuing of async block request to block layer
> - Add "sync" parameter in nvdimm_flush function
> - Use indirect call for nvdimm_flush
> - Donât move declarations to common global header e.g nd.h
> - nvdimm_flush() return 0 or -EIO if it fails
> - Teach nsio_rw_bytes() that the flush can fail
> - Rename nvdimm_flush() to generic_nvdimm_flush()
> - Use 'nd_region->provider_data' for long dereferencing
> - Remove virtio_pmem_freeze/restore functions
> - Remove BSD license text with SPDX license text
> - Add might_sleep() in virtio_pmem_flush - [Luiz]
> - Make spin_lock_irqsave() narrow
> Pankaj Gupta (6):
> libnvdimm: nd_region flush callback support
> virtio-pmem: Add virtio-pmem guest driver
> libnvdimm: add nd_region buffered dax_dev flag
> dax: check synchronous mapping is supported
> ext4: disable map_sync for virtio pmem
> xfs: disable map_sync for virtio pmem
>  https://lkml.org/lkml/2019/4/26/36
>  https://lkml.org/lkml/2019/4/23/1092
>  https://www.spinics.net/lists/kvm/msg149761.html
>  https://www.spinics.net/lists/kvm/msg153095.html
>  https://lkml.org/lkml/2018/8/31/413
>  https://marc.info/?l=linux-kernel&m=153572228719237&w=2
>  https://marc.info/?l=qemu-devel&m=153555721901824&w=2
>  https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
>  https://lkml.org/lkml/2019/1/9/1191
> drivers/acpi/nfit/core.c | 4 -
> drivers/dax/bus.c | 2
> drivers/dax/super.c | 13 +++
> drivers/md/dm.c | 3
> drivers/nvdimm/Makefile | 1
> drivers/nvdimm/claim.c | 6 +
> drivers/nvdimm/nd.h | 1
> drivers/nvdimm/nd_virtio.c | 129 +++++++++++++++++++++++++++++++++++++++
> drivers/nvdimm/pmem.c | 18 +++--
> drivers/nvdimm/region_devs.c | 33 +++++++++
> drivers/nvdimm/virtio_pmem.c | 117 +++++++++++++++++++++++++++++++++++
> drivers/virtio/Kconfig | 10 +++
> fs/ext4/file.c | 10 +--
> fs/xfs/xfs_file.c | 9 +-
> include/linux/dax.h | 25 ++++++-
> include/linux/libnvdimm.h | 9 ++
> include/linux/virtio_pmem.h | 60 ++++++++++++++++++
> include/uapi/linux/virtio_ids.h | 1
> include/uapi/linux/virtio_pmem.h | 10 +++
> 19 files changed, 436 insertions(+), 25 deletions(-)