Re: [PATCH v5 0/9] mm: switch THP shrinker to list_lru
From: Johannes Weiner
Date: Tue Jun 02 2026 - 17:46:17 EST
On Mon, Jun 01, 2026 at 04:36:52PM +0800, Lance Yang wrote:
> As the changelog above says, the old queue is per-memcg only, rather
> than per-memcg-per-node. So reclaim on one node can still walk the whole
> memcg queue and split underused THPs from other nodes in the same memcg.
>
> But I think the new one can lose reclaim in the cgroup.memory=nokmem
> case ...
>
> With nokmem, the deferred shrinker can still run from memcg reclaim,
> because it is SHRINKER_NONSLAB. But the list_lru is no longer per-memcg:
>
> __list_lru_init() clears memcg_aware,
>
> if (mem_cgroup_kmem_disabled())
> memcg_aware = false;
>
> so list_lru_from_memcg_idx() falls back to the shared node list:
>
> static inline struct list_lru_one *
> list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
> {
> if (list_lru_memcg_aware(lru) && idx >= 0) {
> [...]
> }
> return &lru->node[nid].lru;
> }
>
> That makes the shrinker bit unreliable. __list_lru_add() still sets the
> bit on the memcg passed in, but only when the list goes from empty to
> non-empty:
>
> bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
> struct list_head *item, int nid,
> struct mem_cgroup *memcg)
> {
> if (list_empty(item)) {
> [...]
> if (!l->nr_items++)
> set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
> [...]
> return true;
> }
> return false;
> }
>
> If memcg A adds the first folio, A gets the bit. If memcg B later adds a
> folio to the same shared list, B does not get a bit, because the list
> was already non-empty.
>
> So in the A-first/B-later case, reclaim from B may not call the deferred
> shrinker at all. The shared list is scanned from memcg reclaim only if
> reclaim runs from the memcg that has the bit, such as A here, or from
> global reclaim :)
>
> Anyway, only after the shared list is emptied does the next memcg to add
> a folio get to be the one with the bit, IIUC :)
Sorry for the delay, this took me a bit to think about. The shrinker
code is a mess.
I read it the same way you do. And this is true for all list_lru users
when nokmem is set: we just set random nonsense shrinker bits.
HOWEVER, the generic shrinker code fixes that up by IGNORING random
shrinker bits like this when !memcg_kmem_online(). And shrinking
correctly happens only against the shared root queue when the reclaim
iterator walks root_mem_cgroup.
HOWEVER, the THP shrinker explicitly sets SHRINKER_NONSLAB, which in
turn overrides the previous override. So yes there is a weirdness: we
get the root cgroup invocation against the shared queue, and then one
more time triggered by that random memcg bit.
The most direct fix is to just drop SHRINKER_NONSLAB. It declares
independence from kmem, which is no longer true.
Cleaning up the shrinker code is left for another day.
Andrew, if there are no objections, can you please fold this?
---