Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration

From: Yang Shi
Date: Fri Dec 28 2018 - 13:28:54 EST

On Fri, Dec 28, 2018 at 5:31 AM Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> >> > I haven't looked at the implementation yet but if you are proposing a
> >> > special cased zone lists then this is something CDM (Coherent Device
> >> > Memory) was trying to do two years ago and there was quite some
> >> > skepticism in the approach.
> >>
> >> It looks we are pretty different than CDM. :)
> >> We creating new NUMA nodes rather than CDM's new ZONE.
> >> The zonelists modification is just to make PMEM nodes more separated.
> >
> >Yes, this is exactly what CDM was after. Have a zone which is not
> >reachable without explicit request AFAIR. So no, I do not think you are
> >too different, you just use a different terminology ;)
> Got it. OK.. The fall back zonelists patch does need more thoughts.
> In long term POV, Linux should be prepared for multi-level memory.
> Then there will arise the need to "allocate from this level memory".
> So it looks good to have separated zonelists for each level of memory.

I tend to agree with Fengguang. We do have needs for finer grained
control to the usage of DRAM and PMEM, for example, controlling the
percentage of DRAM and PMEM for a specific VMA.

NUMA policy sounds not good enough for some usecases since it just can
control what mempolicy is used by what memory range. Our usecase's
memory access pattern is random in a VMA. So, we can't control the
percentage by mempolicy. We have to put PMEM into a separate zonelist
to make sure memory allocation happens on PMEM when certain criteria
is met as what Fengguang does in this patch series.


> On the other hand, there will also be page allocations that don't care
> about the exact memory level. So it looks reasonable to expect
> different kind of fallback zonelists that can be selected by NUMA policy.
> Thanks,
> Fengguang