Re: [PATCH] acpi/hmat,mm/memtier: always register hmat adist calculation callback

From: Huang, Ying
Date: Tue Jul 30 2024 - 21:14:06 EST


Gregory Price <gourry@xxxxxxxxxx> writes:

> On Tue, Jul 30, 2024 at 09:12:55AM +0800, Huang, Ying wrote:
>> > Right now HMAT appears to be used prescriptively, this despite the fact
>> > that there was a clear intent to separate CPU-nodes and non-CPU-nodes in
>> > the memory-tier code. So this patch simply realizes this intent when the
>> > hints are not very reasonable.
>>
>> If HMAT isn't available, it's hard to put memory devices to
>> appropriate memory tiers without other information. In commit
>> 992bf77591cb ("mm/demotion: add support for explicit memory tiers"),
>> Aneesh pointed out that it doesn't work for his system to put
>> non-CPU-nodes in lower tier.
>>
>
> Per Aneesh in 992bf77591cb - The code explicitly states the intent is
> to put non-CPU-nodes in a lower tier by default.
>
>
> The current implementation puts all nodes with CPU into the highest
> tier, and builds the tier hierarchy by establishing the per-node
> demotion targets based on the distances between nodes.

This describe the behavior before the commit. That is, to be changed in
the commit. One of the most important issues he want to fix is,

* The current tier initialization code always initializes each
memory-only NUMA node into a lower tier. But a memory-only NUMA node
may have a high performance memory device (e.g. a DRAM-backed
memory-only node on a virtual machine) and that should be put into a
higher tier.

> This is accurate for the current code
>
>
> The current tier initialization code always initializes each
> memory-only NUMA node into a lower tier.
>
> This is *broken* for the currently upstream code.
>
> This appears to be the result of the hmat adistance callback introduction
> (though it may have been broken before that).

No, this was changed in Aneesh's commit 992bf77591cb.

--
Best Regards,
Huang, Ying