Re: [PATCH] mm: switch deferred split shrinker to list_lru

From: Dave Chinner

Date: Wed Mar 11 2026 - 18:23:49 EST


On Wed, Mar 11, 2026 at 11:43:58AM -0400, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
>
> alloc/unmap
> deferred_split_folio()
> list_add_tail(memcg->split_queue)
> set_shrinker_bit(memcg, node, deferred_shrinker_id)
>
> for_each_zone_zonelist_nodemask(restricted_nodes)
> mem_cgroup_iter()
> shrink_slab(node, memcg)
> shrink_slab_memcg(node, memcg)
> if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> deferred_split_scan()
> walks memcg->split_queue
>
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
>
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> memcg_list_lru_alloc(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
>
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
>
> The folio_test_partially_mapped() state is currently protected and
> serialized wrt LRU state by the deferred split queue lock. To
> facilitate the transition, add helpers to the list_lru API to allow
> caller-side locking.
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
> include/linux/huge_mm.h | 6 +-
> include/linux/list_lru.h | 48 ++++++
> include/linux/memcontrol.h | 4 -
> include/linux/mmzone.h | 12 --
> mm/huge_memory.c | 326 +++++++++++--------------------------
> mm/internal.h | 2 +-
> mm/khugepaged.c | 7 +
> mm/list_lru.c | 197 ++++++++++++++--------
> mm/memcontrol.c | 12 +-
> mm/memory.c | 52 +++---
> mm/mm_init.c | 14 --
> 11 files changed, 310 insertions(+), 370 deletions(-)

Can you please split this up into multiple patches (i.e. one logical
change per patch) to make it easier to review?

i.e. just from the list-lru persepective, there's multiple complex
changes in the series - locking API changes, new locking primitives,
internally locked functions exposed to callers allowing external
locking, etc. These need to be looked at individually and in
isolation so we can actually discuss the finer details, and that's
almost impossible to do when they are all smashed into one massive
patch.

-Dave.
--
Dave Chinner
dgc@xxxxxxxxxx