Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen

From: Dan Williams
Date: Thu Oct 13 2016 - 15:08:20 EST


On Thu, Oct 13, 2016 at 9:01 AM, Andrew Cooper
<andrew.cooper3@xxxxxxxxxx> wrote:
> On 13/10/16 16:40, Dan Williams wrote:
>> On Thu, Oct 13, 2016 at 2:08 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>> [..]
>>>> I think we can do the similar for Xen, like to lay another pseudo
>>>> device on /dev/pmem and do the reservation, like 2. in my previous
>>>> reply.
>>> Well, my opinion certainly doesn't count much here, but I continue to
>>> consider this a bad idea. For entities like drivers it may well be
>>> appropriate, but I think there ought to be an independent concept
>>> of "OS reserved", and in the Xen case this could then be shared
>>> between hypervisor and Dom0 kernel. Or if we were to consider Dom0
>>> "just a guest", things should even be the other way around: Xen gets
>>> all of the OS reserved space, and Dom0 needs something custom.
>> You haven't made the case why Xen is special and other applications of
>> persistent memory are not.
>
> In a Xen system, Xen runs in the baremetal root-mode ring0, and dom0 is
> a VM running in ring1/3 with the nvdimm driver. This is the opposite
> way around to the KVM model.
>
> Dom0, being the hardware domain, has default ownership of all the
> hardware, but to gain access in the first place, it must request a
> mapping from Xen.

This is where my understanding the Xen model breaks down. Are you
saying dom0 can't access the persistent memory range unless the ring0
agent has metadata storage space for tracking what it maps into dom0?
That can't be true because then PCI memory ranges would not work
without metadata reserve space. Dom0 still needs to map and write the
DIMMs to even set up the struct page reservation, it isn't established
by default.

> Xen therefore needs to know and cope with being able
> to give dom0 a mapping to the nvdimms, without touching the content of
> the nvidmm itself (so as to avoid corrupting data).

Is it true that this metadata only comes into use when remapping the
dom0 discovered range(s) into a guest VM?

> Once dom0 has a mapping of the nvdimm, the nvdimm driver can go to work
> and figure out what is on the DIMM, and which areas are safe to use.

I don't understand this ordering of events. Dom0 needs to have a
mapping to even write the on-media structure to indicate a
reservation. So, initial dom0 access can't depend on metadata
reservation already being present.

> At this point, a Xen subsystem in Linux could choose one or more areas
> to hand back to the hypervisor to use as RAM/other.

To me all this configuration seems to come after the fact. After dom0
sees /dev/pmemX devices, then it can go to work carving it up and
writing Xen specific metadata to the range(s). The struct page
reservation never comes into the picture. In fact, a raw mode
namespace (one without a reservation) could be used in this model, the
nvdimm core never needs to know what is happening.