Re: [PATCH] acpi/hmat,mm/memtier: always register hmat adist calculation callback

From: Gregory Price
Date: Tue Jul 30 2024 - 12:00:24 EST


On Tue, Jul 30, 2024 at 09:12:55AM +0800, Huang, Ying wrote:
> > Right now HMAT appears to be used prescriptively, this despite the fact
> > that there was a clear intent to separate CPU-nodes and non-CPU-nodes in
> > the memory-tier code. So this patch simply realizes this intent when the
> > hints are not very reasonable.
>
> If HMAT isn't available, it's hard to put memory devices to
> appropriate memory tiers without other information. In commit
> 992bf77591cb ("mm/demotion: add support for explicit memory tiers"),
> Aneesh pointed out that it doesn't work for his system to put
> non-CPU-nodes in lower tier.
>

Per Aneesh in 992bf77591cb - The code explicitly states the intent is
to put non-CPU-nodes in a lower tier by default.


The current implementation puts all nodes with CPU into the highest
tier, and builds the tier hierarchy by establishing the per-node
demotion targets based on the distances between nodes.

This is accurate for the current code


The current tier initialization code always initializes each
memory-only NUMA node into a lower tier.

This is *broken* for the currently upstream code.

This appears to be the result of the hmat adistance callback introduction
(though it may have been broken before that).

~Gregory