Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen

From: Dan Williams
Date: Tue Oct 11 2016 - 13:57:34 EST


On Tue, Oct 11, 2016 at 9:58 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Tue, Oct 11, 2016 at 08:53:33AM -0700, Dan Williams wrote:
>> On Tue, Oct 11, 2016 at 6:08 AM, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> >>>> Andrew Cooper <andrew.cooper3@xxxxxxxxxx> 10/10/16 6:44 PM >>>
>> >>On 10/10/16 01:35, Haozhong Zhang wrote:
>> >>> Xen hypervisor needs assistance from Dom0 Linux kernel for following tasks:
>> >>> 1) Reserve an area on NVDIMM devices for Xen hypervisor to place
>> >>> memory management data structures, i.e. frame table and M2P table.
>> >>> 2) Report SPA ranges of NVDIMM devices and the reserved area to Xen
>> >>> hypervisor.
>> >>
>> >>However, I can't see any justification for 1). Dom0 should not be
>> >>involved in Xen's management of its own frame table and m2p. The mfns
>> >>making up the pmem/pblk regions should be treated just like any other
>> >>MMIO regions, and be handed wholesale to dom0 by default.
>> >
>> > That precludes the use as RAM extension, and I thought earlier rounds of
>> > discussion had got everyone in agreement that at least for the pmem case
>> > we will need some control data in Xen.
>>
>> The missing piece for me is why this reservation for control data
>> needs to be done in the libnvdimm core? I would expect that any dax
>
> Isn't it done this way with Linux? That is say if the machine has
> 4GB of RAM and the NVDIMM is in TB range. You want to put the 'struct page'
> for the NVDIMM ranges somewhere. That place can be in regions on the
> NVDIMM that ndctl can reserve.

Yes.

>> capable file could be mapped and made available to a guest. This
>> includes /dev/ramX devices that are dax capable, but are external to
>> the libnvdimm sub-system.
>
> This is more of just keeping track of the ranges if say the DAX file is
> extremely fragmented and requires a lot of 'struct pages' to keep track of
> when stiching up the VMA.

Right, but why does the libnvdimm core need to know about this
specific Xen reservation? For example, if Xen wants some in-kernel
driver to own a pmem region and place its own metadata on the device I
would recommend something like:

bdev = blkdev_get_by_path("/dev/pmemX", FMODE_EXCL...);
bdev_direct_access(bdev, ...);

...in other words, I don't think we want libnvdimm to grow new device
types for every possible in-kernel user, Xen, MD, DM, etc. Instead,
just claim the resulting device.