Re: [PATCH RFC 1/4] drivers/base/node: Add demotion_nodes sys infterface

From: Huang, Ying
Date: Fri Feb 02 2024 - 03:24:34 EST


"Zhijian Li (Fujitsu)" <lizhijian@xxxxxxxxxxx> writes:

> On 31/01/2024 11:17, Li Zhijian wrote:
>>>> node[0].preferred = 2
>>>> node[0].demotion_targets = 2-5
>>>> node[1].preferred = 5
>>>> node[1].demotion_targets = 2-5
>>>> node[2].preferred = 4
>>>> node[2].demotion_targets = 3-4
>>>> node[3].preferred = <empty>
>>>> node[3].demotion_targets = <empty>
>>>> node[4].preferred = <empty>
>>>> node[4].demotion_targets = <empty>
>>>> node[5].preferred = 3
>>>> node[5].demotion_targets = 3-4
>>>>                                                                           But
>>>> this demotion path is not explicitly known to administrator. And with
>>>> the
>>>> feedback from our customers, they also think it is helpful to know demotion
>>>> path built by kernel to understand the demotion behaviors.
>>>>
>>>> So i think we should have 2 new interfaces for each node:
>>>>
>
>>>> /sys/devices/system/node/nodeN/demotion_allowed_nodes
>>>> /sys/devices/system/node/nodeN/demotion_preferred_nodes
>>>>
>>>> I value your opinion, and I'd like to know what you think about...
>>>
>>> Per my understanding, we will not expose everything inside kernel to
>>> user space.  For page placement in a tiered memory system, demotion is
>>> just a part of the story.  For example, if the DRAM of a system becomes
>>> full, new page allocation will fall back to the CXL memory.  Have we
>>> exposed the default page allocation fallback order to user space?
>
>
> Back to our initial requirement:
> When demotion is enabled, what's the demotion path, especially the preferred node?
> are they consistent with administrator's expectations?"
>
> It seems there is no a direct answer. But actually, kernel have already known
> this information, IMHO, exposing them to users is not a bad choice.
>
> This information is able to help them adjust/tune the machine before really
> deploy their workloads.
>
> If the sysfs approach isn't better enough, is it possible to have another more
> user-friendly way to convey this information? like the allocation fallback order does,
> simply print them to dmesg?

I have no object to print some demotion information in dmesg.

--
Best Regards,
Huang, Ying

>
>>
>> Good question, I have no answer yet, but I think we can get the fallback order
>> from the dmesg now.
>>
>> The further action for us is that we will also try improve the use space tool,
>> such as numactl to show the demotion path with the help of this exposed information.