Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration

From: Michal Hocko
Date: Fri Dec 28 2018 - 14:52:29 EST

[Ccing Mel and Andrea]

On Fri 28-12-18 21:31:11, Wu Fengguang wrote:
> > > > I haven't looked at the implementation yet but if you are proposing a
> > > > special cased zone lists then this is something CDM (Coherent Device
> > > > Memory) was trying to do two years ago and there was quite some
> > > > skepticism in the approach.
> > >
> > > It looks we are pretty different than CDM. :)
> > > We creating new NUMA nodes rather than CDM's new ZONE.
> > > The zonelists modification is just to make PMEM nodes more separated.
> >
> > Yes, this is exactly what CDM was after. Have a zone which is not
> > reachable without explicit request AFAIR. So no, I do not think you are
> > too different, you just use a different terminology ;)
> Got it. OK.. The fall back zonelists patch does need more thoughts.
> In long term POV, Linux should be prepared for multi-level memory.
> Then there will arise the need to "allocate from this level memory".
> So it looks good to have separated zonelists for each level of memory.

Well, I do not have a good answer for you here. We do not have good
experiences with those systems, I am afraid. NUMA is with us for more
than a decade yet our APIs are coarse to say the least and broken at so
many times as well. Starting a new API just based on PMEM sounds like a
ticket to another disaster to me.

I would like to see solid arguments why the current model of numa nodes
with fallback in distances order cannot be used for those new
technologies in the beginning and develop something better based on our
experiences that we gain on the way.

I would be especially interested about a possibility of the memory
migration idea during a memory pressure and relying on numa balancing to
resort the locality on demand rather than hiding certain NUMA nodes or
zones from the allocator and expose them only to the userspace.

> On the other hand, there will also be page allocations that don't care
> about the exact memory level. So it looks reasonable to expect
> different kind of fallback zonelists that can be selected by NUMA policy.
> Thanks,
> Fengguang

Michal Hocko