Re: [PATCH v2 2/3] libnvdimm, pmem: adjust for section collisions with 'System RAM'

From: Toshi Kani
Date: Mon Mar 07 2016 - 12:04:39 EST


On Fri, 2016-03-04 at 18:23 -0800, Dan Williams wrote:
> On Fri, Mar 4, 2016 at 6:48 PM, Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > On Thu, 2016-03-03 at 13:53 -0800, Dan Williams wrote:
> > > On a platform where 'Persistent Memory' and 'System RAM' are mixed
> > > within a given sparsemem section, trim the namespace and notify about
> > > the
> > > sub-optimal alignment.
> > >
> > > Cc: Toshi Kani <toshi.kani@xxxxxxx>
> > > Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > ---
> > > Âdrivers/nvdimm/namespace_devs.c |ÂÂÂÂ7 ++
> > > Âdrivers/nvdimm/pfn.hÂÂÂÂÂÂÂÂÂÂÂÂ|ÂÂÂ10 ++-
> > > Âdrivers/nvdimm/pfn_devs.cÂÂÂÂÂÂÂ|ÂÂÂÂ5 ++
> > > Âdrivers/nvdimm/pmem.cÂÂÂÂÂÂÂÂÂÂÂ|ÂÂ125 ++++++++++++++++++++++++++++-
> > > ----
> > > ------
> > > Â4 files changed, 111 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/drivers/nvdimm/namespace_devs.c
> > > b/drivers/nvdimm/namespace_devs.c
> > > index 8ebfcaae3f5a..463756ca2d4b 100644
> > > --- a/drivers/nvdimm/namespace_devs.c
> > > +++ b/drivers/nvdimm/namespace_devs.c
> > > @@ -133,6 +133,7 @@ bool nd_is_uuid_unique(struct device *dev, u8
> > > *uuid)
> > > Âbool pmem_should_map_pages(struct device *dev)
> > > Â{
> > > ÂÂÂÂÂÂstruct nd_region *nd_region = to_nd_region(dev->parent);
> > > +ÂÂÂÂÂstruct nd_namespace_io *nsio;
> > >
> > > ÂÂÂÂÂÂif (!IS_ENABLED(CONFIG_ZONE_DEVICE))
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn false;
> > > @@ -143,6 +144,12 @@ bool pmem_should_map_pages(struct device *dev)
> > > ÂÂÂÂÂÂif (is_nd_pfn(dev) || is_nd_btt(dev))
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn false;
> > >
> > > +ÂÂÂÂÂnsio = to_nd_namespace_io(dev);
> > > +ÂÂÂÂÂif (region_intersects(nsio->res.start, resource_size(&nsio-
> > > > res),
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂIORESOURCE_SYSTEM_RAM,
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂIORES_DESC_NONE) == REGION_MIXED)
> >
> > Should this be != REGION_DISJOINT for safe?
>
> Acutally, it's ok.ÂÂIt doesn't need to be disjoint.ÂÂThe problem is
> mixing an mm-zone within a given section.ÂÂIf the region intersects
> system-ram then devm_memremap_pages() is a no-op and we can use the
> existing page allocation and linear mapping.

Oh, I see.

> >
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂreturn false;
> > > +
> >
> > Â:
> >
> > > @@ -304,21 +311,56 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> > > ÂÂÂÂÂÂ}
> > >
> > > ÂÂÂÂÂÂmemset(pfn_sb, 0, sizeof(*pfn_sb));
> > > -ÂÂÂÂÂnpfns = (pmem->size - SZ_8K) / SZ_4K;
> > > +
> > > +ÂÂÂÂÂ/*
> > > +ÂÂÂÂÂÂ* Check if pmem collides with 'System RAM' when section
> > > aligned
> > > and
> > > +ÂÂÂÂÂÂ* trim it accordingly
> > > +ÂÂÂÂÂÂ*/
> > > +ÂÂÂÂÂnsio = to_nd_namespace_io(&ndns->dev);
> > > +ÂÂÂÂÂstart = PHYS_SECTION_ALIGN_DOWN(nsio->res.start);
> > > +ÂÂÂÂÂsize = resource_size(&nsio->res);
> > > +ÂÂÂÂÂif (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂIORES_DESC_NONE) == REGION_MIXED) {
> > > +
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂstart = nsio->res.start;
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂstart_pad = PHYS_SECTION_ALIGN_UP(start) - start;
> > > +ÂÂÂÂÂ}
> > > +
> > > +ÂÂÂÂÂstart = nsio->res.start;
> > > +ÂÂÂÂÂsize = PHYS_SECTION_ALIGN_UP(start + size) - start;
> > > +ÂÂÂÂÂif (region_intersects(start, size, IORESOURCE_SYSTEM_RAM,
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂIORES_DESC_NONE) == REGION_MIXED) {
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂsize = resource_size(&nsio->res);
> > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂend_trunc = start + size -
> > > PHYS_SECTION_ALIGN_DOWN(start
> > > + size);
> > > +ÂÂÂÂÂ}
> >
> > This check seems to assume that guest's regular memory layout does not
> > change.ÂÂThat is, if there is no collision at first, there won't be any
> > later.ÂÂIs this a valid assumption?
>
> If platform firmware changes the physical alignment during the
> lifetime of the namespace there's not much we can do.ÂÂ

The physical alignment can be changed as long as it is large enough (see
below).

> Another problem
> not addressed by this patch is firmware choosing to hot plug system
> ram into the same section as persistent memory.ÂÂ

Yes, and it does not have to be a hot-plug operation. ÂMemory size may be
changed off-line. ÂData image can be copied to different guests for instant
deployment, or may be migrated to a different guest.

> As far as I can see
> all we do is ask firmware implementations to respect Linux section
> boundaries and otherwise not change alignments.

In addition to the requirement that pmem range alignment may not change,
the code also requires a regular memory range does not change to intersect
with a pmem section later. ÂThis seems fragile to me since guest config may
vary / change as I mentioned above.

So, shouldn't the driver fails to attach when the range is not aligned by
the section size? ÂSince we need to place a requirement to firmware anyway,
we can simply state that it must be aligned by 128MiB (at least) on x86.
ÂThen, memory and pmem physical layouts can be changed as long as this
requirement is met.

Thanks,
-Toshi