Re: [PATCH v4 8/8] mm: switch deferred split shrinker to list_lru
From: Johannes Weiner
Date: Tue May 26 2026 - 16:10:51 EST
On Tue, May 26, 2026 at 05:09:22AM -0700, Usama Arif wrote:
> On Thu, 21 May 2026 11:02:14 -0400 Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > @@ -4446,36 +4377,20 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
> > count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> > count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
> > mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
> > -
> > }
> > } else {
> > /* partially mapped folios cannot become non-partially mapped */
> > VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
> > }
> > - if (list_empty(&folio->_deferred_list)) {
> > - struct mem_cgroup *memcg;
> > -
> > - memcg = folio_split_queue_memcg(folio, ds_queue);
> > - list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> > - ds_queue->split_queue_len++;
> > - if (memcg)
> > - set_shrinker_bit(memcg, folio_nid(folio),
> > - shrinker_id(deferred_split_shrinker));
> > - }
> > - split_queue_unlock_irqrestore(ds_queue, flags);
> > + __list_lru_add(&deferred_split_lru, lru, &folio->_deferred_list, nid, memcg);
> > + list_lru_unlock_irqrestore(lru, &flags);
> > + rcu_read_unlock();
>
> Can the shrinker bit end up on the wrong memcg here?
>
> deferred_split_folio() takes the lock via list_lru_lock_irqsave() with
> the original folio_memcg() of the folio. If that memcg is dying and
> already reparented, lock_list_lru_of_memcg() walks up parent_mem_cgroup()
> until it finds a live sublist and locks it (lru), but the memcg local
> variable still points at the dying child.
>
> __list_lru_add() then calls set_shrinker_bit(memcg, ...) with the
> original (dying / reparented) memcg, not the parent that actually owns
> the locked sublist where the folio was queued.
I think you're right. Good catch.
This looks like an existing list_lru issue, actually. Even before,
list_lru_add() would call lock_list_lru_of_memcg() which might do the
hierarchy walk. But then set_shrinker_bit() on the passed in memcg.
I'll fix this in list_lru first. It's a likely backport
candidate. Then rebase my patches on top.
> > @@ -1301,6 +1301,9 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long s
> > if (result != SCAN_SUCCEED)
> > goto out_nolock;
> >
> > + if (folio_memcg_alloc_deferred(folio))
> > + goto out_nolock;
> > +
>
> Over here, you will end up reporting success on allocation failure.
>
> Maybe set result to SCAN_ALLOC_HUGE_PAGE_FAIL?
Good point. Sashiko pointed that out as well. I have a follow-up fix.
Thanks for taking a look.