RE: [RFC PATCH 0/5] New fallback workflow for heterogeneous memory system

From: Du, Fan
Date: Thu Apr 25 2019 - 03:41:46 EST




>-----Original Message-----
>From: Michal Hocko [mailto:mhocko@xxxxxxxxxx]
>Sent: Thursday, April 25, 2019 2:37 PM
>To: Du, Fan <fan.du@xxxxxxxxx>
>Cc: akpm@xxxxxxxxxxxxxxxxxxxx; Wu, Fengguang <fengguang.wu@xxxxxxxxx>;
>Williams, Dan J <dan.j.williams@xxxxxxxxx>; Hansen, Dave
><dave.hansen@xxxxxxxxx>; xishi.qiuxishi@xxxxxxxxxxxxxxx; Huang, Ying
><ying.huang@xxxxxxxxx>; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
>Subject: Re: [RFC PATCH 0/5] New fallback workflow for heterogeneous
>memory system
>
>On Thu 25-04-19 09:21:30, Fan Du wrote:
>[...]
>> However PMEM has different characteristics from DRAM,
>> the more reasonable or desirable fallback style would be:
>> DRAM node 0 -> DRAM node 1 -> PMEM node 2 -> PMEM node 3.
>> When DRAM is exhausted, try PMEM then.
>
>Why and who does care? NUMA is fundamentally about memory nodes with
>different access characteristics so why is PMEM any special?

Michal, thanks for your comments!

The "different" lies in the local or remote access, usually the underlying
memory is the same type, i.e. DRAM.

By "special", PMEM is usually in gigantic capacity than DRAM per dimm,
while with different read/write access latency than DRAM. Iow PMEM
sits right under DRAM in the memory tier hierarchy.

This makes PMEM to be far memory, or second class memory.
So we give first class DRAM page to user, fallback to PMEM when
necessary.

The Cloud Service Provider can use DRAM + PMEM in their system,
Leveraging method [1] to keep hot page in DRAM and warm or cold
Page in PMEM, achieve optimal performance and reduce total cost
of ownership at the same time.

[1]:
https://github.com/fengguang/memory-optimizer

>--
>Michal Hocko
>SUSE Labs