Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers

From: Aneesh Kumar K V
Date: Mon Jun 06 2022 - 05:03:06 EST


On 6/6/22 2:22 PM, Ying Huang wrote:
....
I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to
MEMORY_TIER_PMEM" before switching the demotion logic so that on systems
with two memory tiers (DRAM and pmem) the demotion continues to work
as expected after patch 3 ("mm/demotion: Build demotion targets based on
explicit memory tiers"). With that, there will not be any regression in
between the patch series.


Thanks! Please do that. And I think you can add sysfs interface after
that patch too. That is, in [1/7]


I am not sure why you insist on moving sysfs interfaces later. They are
introduced based on the helper added. It make patch review easier to
look at both the helpers and the user of the helper together in a patch.

Yes. We should introduce a function and its user in one patch for
review. But this doesn't mean that we should introduce the user space
interface as the first step. I think the user space interface should
output correct information when we expose it.


If you look at this patchset we are not exposing any wrong information.

patch 1 -> adds ability to register the memory tiers and expose details of registered memory tier. At this point the patchset only support DRAM tier and hence only one tier is shown

patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the details, because the patchset yet has not introduced a slower memory tier like PMEM.

patch 4 -> introducing demotion. Will make that patch 5

patch 5 -> add dax kmem numa nodes as slower memory tier. Now this becomes patch 4 at which point we will correctly show two memory tiers in the system.


+struct memory_tier {
+ nodemask_t nodelist;
+};

And struct device can be added after the kernel has switched the
implementation based on explicit memory tiers.

+struct memory_tier {
+ struct device dev;
+ nodemask_t nodelist;
+};



Can you elaborate on this? or possibly review the v5 series indicating
what change you are suggesting here?


But I don't think it's a good idea to have "struct device" embedded in
"struct memory_tier". We don't have "struct device" embedded in "struct
pgdata_list"...


I avoided creating an array for memory_tier (memory_tier[]) so that we
can keep it dynamic. Keeping dev embedded in struct memory_tier simplify
the life cycle management of that dynamic list. We free the struct
memory_tier allocation via device release function (memtier->dev.release
= memory_tier_device_release )

Why do you think it is not a good idea?

I think that we shouldn't bind our kernel internal implementation with
user space interface too much. Yes. We can expose kernel internal
implementation to user space in a direct way. I suggest you to follow
the style of "struct pglist_data" and "struct node". If we decouple
"struct memory_tier" and "struct memory_tier_dev" (or some other name),
we can refer to "struct memory_tier" without depending on all device
core. Memory tier should be accessible inside the kernel even without a
user interface. And memory tier isn't a device in concept.


memory_tiers are different from pglist_data and struct node in that we also allow the creation of them from userspace. That is the life time of a memory tier is driven from userspace and it is much easier to manage them via sysfs file lifetime mechanism rather than inventing an independent and more complex way of doing the same.

For life cycle management, I think that we can do that without sysfs
too.


unless there are specific details that you think will be broken by embedding struct device inside struct memory_tier, IMHO I still consider the embedded implementation much simpler and in accordance with other kernel design patterns.

-aneesh