Re: [PATCH v2 4/4] mm: thp: reparent the split queue during memcg offline

From: Qi Zheng
Date: Thu Sep 25 2025 - 02:12:19 EST


Hi David,

On 9/24/25 8:38 PM, David Hildenbrand wrote:
On 23.09.25 11:16, Qi Zheng wrote:
In the future, we will reparent LRU folios during memcg offline to
eliminate dying memory cgroups, which requires reparenting the split queue
to its parent.

Similar to list_lru, the split queue is relatively independent and does
not need to be reparented along with objcg and LRU folios (holding
objcg lock and lru lock). So let's apply the same mechanism as list_lru
to reparent the split queue separately when memcg is offine.

Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
  include/linux/huge_mm.h |  2 ++
  include/linux/mmzone.h  |  1 +
  mm/huge_memory.c        | 39 +++++++++++++++++++++++++++++++++++++++
  mm/memcontrol.c         |  1 +
  mm/mm_init.c            |  1 +
  5 files changed, 44 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f327d62fc9852..a0d4b751974d2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -417,6 +417,7 @@ static inline int split_huge_page(struct page *page)
      return split_huge_page_to_list_to_order(page, NULL, ret);
  }
  void deferred_split_folio(struct folio *folio, bool partially_mapped);
+void reparent_deferred_split_queue(struct mem_cgroup *memcg);
  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
          unsigned long address, bool freeze);
@@ -611,6 +612,7 @@ static inline int try_folio_split(struct folio *folio, struct page *page,
  }
  static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
+static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {}
  #define split_huge_pmd(__vma, __pmd, __address)    \
      do { } while (0)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7fb7331c57250..f3eb81fee056a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1346,6 +1346,7 @@ struct deferred_split {
      spinlock_t split_queue_lock;
      struct list_head split_queue;
      unsigned long split_queue_len;
+    bool is_dying;

It's a bit weird to query whether the "struct deferred_split" is dying. Shouldn't this be a memcg property? (and in particular, not exist for

There is indeed a CSS_DYING flag. But we must modify 'is_dying' under
the protection of the split_queue_lock, otherwise the folio may be added
back to the deferred_split of child memcg.

the pglist_data part where it might not make sense at all?).

Maybe:

#ifdef CONFIG_MEMCG
bool is_dying;
#endif


  };
  #endif
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 48b51e6230a67..de7806f759cba 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1094,9 +1094,15 @@ static struct deferred_split *folio_split_queue_lock(struct folio *folio)
      struct deferred_split *queue;
      memcg = folio_memcg(folio);
+retry:
      queue = memcg ? &memcg->deferred_split_queue :
              &NODE_DATA(folio_nid(folio))->deferred_split_queue;
      spin_lock(&queue->split_queue_lock);
+    if (unlikely(queue->is_dying == true)) {

if (unlikely(queue->is_dying))

Will do.


+        spin_unlock(&queue->split_queue_lock);
+        memcg = parent_mem_cgroup(memcg);
+        goto retry;
+    }
      return queue;
  }
@@ -1108,9 +1114,15 @@ folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
      struct deferred_split *queue;
      memcg = folio_memcg(folio);
+retry:
      queue = memcg ? &memcg->deferred_split_queue :
              &NODE_DATA(folio_nid(folio))->deferred_split_queue;
      spin_lock_irqsave(&queue->split_queue_lock, *flags);
+    if (unlikely(queue->is_dying == true)) {

if (unlikely(queue->is_dying))

Will do.


+        spin_unlock_irqrestore(&queue->split_queue_lock, *flags);
+        memcg = parent_mem_cgroup(memcg);
+        goto retry;
+    }
      return queue;
  }

Nothing else jumped at me, but I am not a memcg expert :)

Thanks,
Qi