Re: [RFC -next] memcg: Optimize creation performance when LRU_GEN is enabled

Next message: Aaron Tomlin: "[PATCH 0/3] x86/resctrl: Add &quot;*&quot; shorthand to set minimum io_alloc CBM for all domains"
Previous message: Maarten Lankhorst: "Re: [PATCH v3 19/28] drm/ttm: rework pipelined eviction fence handling"
In reply to: Chen Ridong: "Re: [RFC -next] memcg: Optimize creation performance when LRU_GEN is enabled"
Next in thread: Chen Ridong: "Re: [RFC -next] memcg: Optimize creation performance when LRU_GEN is enabled"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Johannes Weiner

Date: Wed Nov 26 2025 - 12:15:19 EST

On Wed, Nov 19, 2025 at 08:37:22AM +0000, Chen Ridong wrote:
> From: Chen Ridong <chenridong@xxxxxxxxxx>
>
> With LRU_GEN=y and LRU_GEN_ENABLED=n, a performance regression occurs
> when creating a large number of memory cgroups (memcgs):
>
> # time mkdir testcg_{1..10000}
>
> real 0m7.167s
> user 0m0.037s
> sys 0m6.773s
>
> # time mkdir testcg_{1..20000}
>
> real 0m27.158s
> user 0m0.079s
> sys 0m26.270s
>
> In contrast, with LRU_GEN=n, creation of the same number of memcgs
> performs better:
>
> # time mkdir testcg_{1..10000}
>
> real 0m3.386s
> user 0m0.044s
> sys 0m3.009s
>
> # time mkdir testcg_{1..20000}
>
> real 0m6.876s
> user 0m0.075s
> sys 0m6.121s
>
> The root cause is that lru_gen node onlining uses hlist_nulls_add_tail_rcu,
> which traverses the entire list to find the tail. This traversal scales
> with the number of memcgs, even when LRU_GEN is runtime-disabled.

Can you please look into removing the memcg LRU instead?

Use mem_cgroup_iter() with a reclaim cookie in shrink_many(), like we
do in shrink_node_memcgs().

The memcg LRU is complicated, and it only works for global reclaim; if
you have a subtree with a memory.max at the top, it'll go through
shrink_node_memcgs() already anyway.