Re: [PATCH RFC 1/4] drivers/base/node: Add demotion_nodes sys infterface

From: Zhijian Li (Fujitsu)
Date: Fri Feb 02 2024 - 02:44:37 EST



On 31/01/2024 11:17, Li Zhijian wrote:
>>> node[0].preferred = 2
>>> node[0].demotion_targets = 2-5
>>> node[1].preferred = 5
>>> node[1].demotion_targets = 2-5
>>> node[2].preferred = 4
>>> node[2].demotion_targets = 3-4
>>> node[3].preferred = <empty>
>>> node[3].demotion_targets = <empty>
>>> node[4].preferred = <empty>
>>> node[4].demotion_targets = <empty>
>>> node[5].preferred = 3
>>> node[5].demotion_targets = 3-4
>>>                                                                           But
>>> this demotion path is not explicitly known to administrator. And with
>>> the
>>> feedback from our customers, they also think it is helpful to know demotion
>>> path built by kernel to understand the demotion behaviors.
>>>
>>> So i think we should have 2 new interfaces for each node:
>>>

>>> /sys/devices/system/node/nodeN/demotion_allowed_nodes
>>> /sys/devices/system/node/nodeN/demotion_preferred_nodes
>>>
>>> I value your opinion, and I'd like to know what you think about...
>>
>> Per my understanding, we will not expose everything inside kernel to
>> user space.  For page placement in a tiered memory system, demotion is
>> just a part of the story.  For example, if the DRAM of a system becomes
>> full, new page allocation will fall back to the CXL memory.  Have we
>> exposed the default page allocation fallback order to user space?


Back to our initial requirement:
When demotion is enabled, what's the demotion path, especially the preferred node?
are they consistent with administrator's expectations?"

It seems there is no a direct answer. But actually, kernel have already known
this information, IMHO, exposing them to users is not a bad choice.

This information is able to help them adjust/tune the machine before really
deploy their workloads.

If the sysfs approach isn't better enough, is it possible to have another more
user-friendly way to convey this information? like the allocation fallback order does,
simply print them to dmesg?


Thanks
Zhijian


>
> Good question, I have no answer yet, but I think we can get the fallback order
> from the dmesg now.
>
> The further action for us is that we will also try improve the use space tool,
> such as numactl to show the demotion path with the help of this exposed information.