Re: [RFC PATCH 0/9 v2] mm/memcontrol: Make memory cgroup limits tier-aware

From: David Hildenbrand (Arm)

Date: Mon May 11 2026 - 12:22:50 EST


On 4/23/26 22:34, Joshua Hahn wrote:
> INTRODUCTION
> ============
> Memory cgroups provide an interface that allow multiple works on a host to
> co-exist via weak and strong memory isolation guarantees. This works, because
> for the most part, all memory has equal utility. Isolating a cgroup’s memory
> footprint restricts how much it can hurt other workloads competing for memory,
> or protects it from other cgroups looking for more memory.
>
> However, on systems with tiered memory (e.g. CXL), memory utility is no longer
> homogeneous; toptier and lowtier memory provide different performance
> characteristics and have different scarcity, meaning memory footprint no longer
> serves as an accurate representation of a cgroup’s consumption of the system’s
> limited resources. As an extreme example, a cgroup with 10G of toptier
> (e.g. DRAM) memory and a cgroup with 10G of lowtier (e.g. CXL) memory both
> appear to be consuming the same amount of system resources from memcg’s
> perspective, despite the performance asymmetry between the two workloads.
>
> Therefore on tiered systems, memory isolation cannot currently happen, as
> workloads that are well-behaved within their memcg limits may still hurt the
> performance of other well-behaving workloads by hogging more than its
> “fair share” of toptier memory.
>
> Introduce tier-aware memcg limits, which establish independent toptier limits
> that scale with the memory limits and the ratio of toptier:total memory
> available on the system.
>
> INTERFACE
> =========
> This series introduces only one adjustable knob to userspace; a new cgroup mount
> option “memory_tiered_limits” which toggles whether the cgroup mount will scale
> toptier limits. It also introduces 4 new read-only sysfs entries per-cgroup:
> memory.toptier_{min, low, high, max}.
>
> The new toptier memory limits are scaled according to the amount of toptier
> memory and total memory available on the system as such:
>
> memory.toptier_high = (toptier_mem / total_mem) * memory.high
>
> For instance, on a host with 100GB memory, with 75G toptier and 25G CXL, the
> “toptier ratio” would be 75 / 100 = 0.75. A cgroup with the following memcg
> limits {min: 8G, low: 12G, high: 20G, max: 24G} might see toptier limits scaled
> at {min: 6G, low: 9G, high: 15G, max: 18G}.

Assume you have a bigger hierarchy (HBP, DRAM, CXL), or assume you have multiple
NUMA nodes with a hierarchy each.

Your proposal doesn't really seem to be very versatile, or am I wrong?

--
Cheers,

David