Re: [RFC] memory tiering: use small chunk size and more tiers

From: Bharata B Rao
Date: Fri Oct 28 2022 - 09:54:16 EST


On 10/28/2022 2:03 PM, Huang, Ying wrote:
> Bharata B Rao <bharata@xxxxxxx> writes:
>
>> On 10/28/2022 11:16 AM, Huang, Ying wrote:
>>> If my understanding were correct, you think the latency / bandwidth of
>>> these NUMA nodes will near each other, but may be different.
>>>
>>> Even if the latency / bandwidth of these NUMA nodes isn't exactly same,
>>> we should deal with that in memory types instead of memory tiers.
>>> There's only one abstract distance for each memory type.
>>>
>>> So, I still believe we will not have many memory tiers with my proposal.
>>>
>>> I don't care too much about the exact number, but want to discuss some
>>> general design choice,
>>>
>>> a) Avoid to group multiple memory types into one memory tier by default
>>> at most times.
>>
>> Do you expect the abstract distances of two different types to be
>> close enough in real life (like you showed in your example with
>> CXL - 5000 and PMEM - 5100) that they will get assigned into same tier
>> most times?
>>
>> Are you foreseeing that abstract distance that get mapped by sources
>> like HMAT would run into this issue?
>
> Only if we set abstract distance chunk size large. So, I think that
> it's better to set chunk size as small as possible to avoid potential
> issue. What is the downside to set the chunk size small?

I don't see anything in particular. However

- With just two memory types (default_dram_type and dax_slowmem_type
with adistance values of 576 and 576*5 respectively) defined currently,
- With no interface yet to set/change adistance value of a memory type,
- With no defined way to convert the performance characteristics info
(bw and latency) from sources like HMAT into a adistance value,

I find it a bit difficult to see how a chunk size of 10 against the
existing 128 could be more useful.

Regards,
Bharata.