Re: [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node

From: Aneesh Kumar K.V
Date: Mon Jan 07 2019 - 09:09:41 EST


Fengguang Wu <fengguang.wu@xxxxxxxxx> writes:

> On Tue, Jan 01, 2019 at 02:44:41PM +0530, Aneesh Kumar K.V wrote:
>>Fengguang Wu <fengguang.wu@xxxxxxxxx> writes:
>>
>>> From: Fan Du <fan.du@xxxxxxxxx>
>>>
>>> When allocate page, DRAM and PMEM node should better not fall back to
>>> each other. This allows migration code to explicitly control which type
>>> of node to allocate pages from.
>>>
>>> With this patch, PMEM NUMA node can only be used in 2 ways:
>>> - migrate in and out
>>> - numactl
>>
>>Can we achieve this using nodemask? That way we don't tag nodes with
>>different properties such as DRAM/PMEM. We can then give the
>>flexibilility to the device init code to add the new memory nodes to
>>the right nodemask
>
> Aneesh, in patch 2 we did create nodemask numa_nodes_pmem and
> numa_nodes_dram. What's your supposed way of "using nodemask"?
>

IIUC the patch is to avoid allocation from PMEM nodes and the way you
achieve it is by checking if (is_node_pmem(n)). We already have
abstractness to avoid allocation from a node using node mask. I was
wondering whether we can do the equivalent of above using that.

ie, __next_zone_zonelist can do zref_in_nodemask(z,
default_exclude_nodemask)) and decide whether to use the specific zone
or not.

That way we don't add special code like

+ PGDAT_DRAM, /* Volatile DRAM memory node */
+ PGDAT_PMEM, /* Persistent memory node */

The reason is that there could be other device memory that would want to
get excluded from that default allocation like you are doing for PMEM

-aneesh