Re: [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers

From: Johannes Weiner
Date: Thu Jun 09 2022 - 16:41:12 EST


On Thu, Jun 09, 2022 at 03:22:43PM +0100, Jonathan Cameron wrote:
> I think discussion hinged on it making sense to be able to change
> rank of a tier rather than create a new tier and move things one by one.
> Example was wanting to change the rank of a tier that was created
> either by core code or a subsystem.
>
> E.g. If GPU driver creates a tier, assumption is all similar GPUs will
> default to the same tier (if hot plugged later for example) as the
> driver subsystem will keep a reference to the created tier.
> Hence if user wants to change the order of that relative to
> other tiers, the option of creating a new tier and moving the
> devices would then require us to have infrastructure to tell the GPU
> driver to now use the new tier for additional devices.

That's an interesting point, thanks for explaining.

But that could still happen when two drivers report the same tier and
one of them is wrong, right? You'd still need to separate out by hand
to adjust rank, as well as handle hotplug events. Driver colllisions
are probable with coarse categories like gpu, dram, pmem.

Would it make more sense to have the platform/devicetree/driver
provide more fine-grained distance values similar to NUMA distances,
and have a driver-scope tunable to override/correct? And then have the
distance value function as the unique tier ID and rank in one.

That would allow device class reassignments, too, and it would work
with driver collisions where simple "tier stickiness" would
not. (Although collisions would be less likely to begin with given a
broader range of possible distance values.)

Going further, it could be useful to separate the business of hardware
properties (and configuring quirks) from the business of configuring
MM policies that should be applied to the resulting tier hierarchy.
They're somewhat orthogonal tuning tasks, and one of them might become
obsolete before the other (if the quality of distance values provided
by drivers improves before the quality of MM heuristics ;). Separating
them might help clarify the interface for both designers and users.

E.g. a memdev class scope with a driver-wide distance value, and a
memdev scope for per-device values that default to "inherit driver
value". The memtier subtree would then have an r/o structure, but
allow tuning per-tier interleaving ratio[1], demotion rules etc.

[1] https://lore.kernel.org/linux-mm/20220607171949.85796-1-hannes@xxxxxxxxxxx/#t