Re: [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen

From: Haozhong Zhang
Date: Tue Oct 11 2016 - 03:12:09 EST


On 10/10/16 09:24, Dan Williams wrote:
> On Sun, Oct 9, 2016 at 11:32 PM, Haozhong Zhang
> <haozhong.zhang@xxxxxxxxx> wrote:
> > On 10/09/16 20:45, Dan Williams wrote:
> >> On Sun, Oct 9, 2016 at 5:35 PM, Haozhong Zhang <haozhong.zhang@xxxxxxxxx> wrote:
> >> > Overview
> >> > ========
> >> > This RFC kernel patch series along with corresponding patch series of
> >> > Xen, QEMU and ndctl implements Xen vNVDIMM, which can map the host
> >> > NVDIMM devices to Xen HVM domU as vNVDIMM devices.
> >> >
> >> > Xen hypervisor does not include an NVDIMM driver, so it needs the
> >> > assistance from the driver in Dom0 Linux kernel to manage NVDIMM
> >> > devices. We currently only supports NVDIMM devices in pmem mode.
> >> >
> >> > Design and Implementation
> >> > =========================
> >> > The complete design can be found at
> >> > https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg01921.html.
> >>
> >> The KVM enabling for persistent memory does not need this support from
> >> the kernel, and as far as I can see neither does Xen. If the
> >> hypervisor needs to reserve some space it can simply trim the amount
> >> that it hands to the guest.
> >>
> >
> > Xen does not have the NVDIMM driver, so it cannot operate on NVDIMM
> > devices by itself. Instead it relies on the driver in Dom0 Linux to
> > probe NVDIMM and make the reservation.
>
> I'm missing something because the design document talks about mmap'ing
> files on a DAX filesystem. So, I'm assuming it is similar to the KVM
> NVDIMM virtualization case where an mmap range in dom0 is translated
> into a guest physical range. The suggestion is to reserve some memory
> out of that mapping rather than introduce a new info block /
> reservation type to the sub-system.

Just like struct page to linux, Xen hypervisor uses a struct page_info
for its memory management. We are facing the same problem as linux
kernel: where we store those structs for pmem, and decided to put them
on a reserved area on pmem, similar to what pfn device in kernel does.

Reserving at the moment of mmap and out of what is mapped does not
work. It's a bootstrap problem: Xen needs the information of those
pages, which are stored in struct page_info, at the moment of
mapping. That is, page_info structs for pmem pages should be prepared
before those pages are actually used.

However, as the ongoing discussion in another thread with Andrew
Cooper, if Xen hypervisor turns to treat pmem pages as MMIO, then the
reservation may not be needed. Let's see what conclusion will be
reached there.

Thanks,
Haozhong