Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system

From: Michal Hocko
Date: Thu Apr 25 2019 - 03:53:57 EST


On Thu 25-04-19 07:41:40, Du, Fan wrote:
>
>
> >-----Original Message-----
> >From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
> >Sent: Thursday, April 25, 2019 2:37 PM
> >To: Du, Fan <fan.du@xxxxxxxxx>
> >Cc: akpm@xxxxxxxxxxxxxxxxxxxx; Wu, Fengguang <fengguang.wu@xxxxxxxxx>;
> >Williams, Dan J <dan.j.williams@xxxxxxxxx>; Hansen, Dave
> ><dave.hansen@xxxxxxxxx>; xishi.qiuxishi@xxxxxxxxxxxxxxx; Huang, Ying
> ><ying.huang@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
> >memory system
> >
> >On Thu 25-04-19 09:21:30, Fan Du wrote:
> >[...]
> >> However PMEM has different characteristics from DRAM,
> >> the more reasonable or desirable fallback style would be:
> >> DRAM node 0 -> DRAM node 1 -> PMEM node 2 -> PMEM node 3.
> >> When DRAM is exhausted, try PMEM then.
> >
> >Why and who does care? NUMA is fundamentally about memory nodes with
> >different access characteristics so why is PMEM any special?
>
> Michal, thanks for your comments!
>
> The "different" lies in the local or remote access, usually the underlying
> memory is the same type, i.e. DRAM.
>
> By "special", PMEM is usually in gigantic capacity than DRAM per dimm,
> while with different read/write access latency than DRAM.

You are describing a NUMA in general here. Yes access to different NUMA
nodes has a different read/write latency. But that doesn't make PMEM
really special from a regular DRAM. There are few other people trying to
work with PMEM as NUMA nodes and these kind of arguments are repeating
again and again. So far I haven't really heard much beyond hand waving.
Please go and read through those discussion so that we do not have to go
throug the same set of arguments again.

I absolutely do see and understand people want to find a way to use
their shiny NVIDIMs but please step back and try to think in more
general terms than PMEM is special and we have to treat it that way.
We currently have ways to use it as DAX device and a NUMA node then
focus on how to improve our NUMA handling so that we can get maximum out
of the HW rather than make a PMEM NUMA node a special snow flake.

Thank you.

--
Michal Hocko
SUSE Labs