Re: [RFC PATCH 1/2] mm/demotion: Expose memory type details via sysfs

From: Aneesh Kumar K.V
Date: Sun Aug 28 2022 - 12:21:20 EST


Wei Xu <weixugc@xxxxxxxxxx> writes:

>
> On Fri, Aug 26, 2022 at 1:05 AM Aneesh Kumar K V
> <aneesh.kumar@xxxxxxxxxxxxx> wrote:
>>
>> On 8/26/22 1:30 PM, Wei Xu wrote:
>> > On Thu, Aug 25, 2022 at 8:00 PM Aneesh Kumar K V
>> > <aneesh.kumar@xxxxxxxxxxxxx> wrote:
>> >>
>> >> On 8/26/22 7:20 AM, Huang, Ying wrote:
>> >>> "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxx> writes:
>> >>>
>> >>>> This patch adds /sys/devices/virtual/memtier/ where all memory tier related
>> >>>> details can be found. All allocated memory types will be listed there as
>> >>>> /sys/devices/virtual/memtier/memtypeN/
>> >>>
>> >>> Another choice is to make memory types and memory tiers system devices.
>> >>> That is,
>> >>>
>> >>> /sys/devices/system/memory_type/memory_typeN
>> >>> /sys/devices/system/memory_tier/memory_tierN
>> >>>
>> >>
>> >> subsys_system_register() documentation says
>> >>
>> >> * Do not use this interface for anything new, it exists for compatibility
>> >> * with bad ideas only. New subsystems should use plain subsystems; and
>> >> * add the subsystem-wide attributes should be added to the subsystem
>> >> * directory itself and not some create fake root-device placed in
>> >> * /sys/devices/system/<name>.
>> >>
>> >> memtier being a virtual device, I was under the impression that /sys/devices/virtual
>> >> is the recommended place.
>> >>
>> >>> That looks more natural to me. Because we already have "node" and
>> >>> "memory" devices there. Why don't you put memory types and memory tiers
>> >>> there?
>> >>>
>> >>> And, I think we shouldn't put "memory_type" in the "memory_tier"
>> >>> directory. "memory_type" isn't a part of "memory_tier".
>> >>>
>> >>
>> >> I was looking consolidating both memory tier and memory type into the same sysfs subsystem.
>> >> Your recommendation imply we create two subsystem memory_tier and memtype. I was
>> >> trying to avoid that. May be a generic term like "memory_tiering" can help to
>> >> consolidate all tiering related details there?
>> >>
>> >
>> > A generic term "memory_tiering" sounds good to me.
>> >
>> > Given that this will be a user-facing, stable kernel API, I think we'd
>> > better to only add what is most useful for userspace and don't have to
>> > mirror the kernel internal data structures in this interface.
>> >
>> > My understanding is that we haven't fully settled down on how to
>> > customize memory tiers from userspace. So we don't have to show
>> > memory_type yet, which is a kernel data structure at this point.
>> >
>> > The userspace does need to know what are the memory tiers and which
>> > NUMA nodes are included in each memory tier. How about we provide the
>> > "nodelist" interface for each memory tier as in the original proposal?
>> >
>> > The userspace would also like to know which memory tiers/nodes belong
>> > to the top tiers (the promotion targets). We can provide a "toptiers"
>> > or "toptiers_nodelist" interface to report that.
>> >
>>
>> How about also including abstract distance range of a memory tier?
>> That will be useful to derive the hierarchy.
>
> With the base abstract distance in the memtier name, do we need to
> show the abstract distance range if we don't customize memory tiers?
>

IMHO it would simpler to let userspace find abstract distance by reading
a file rather than parsing a file name.

-aneesh