Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru - [s390] panic in __memcg_list_lru_alloc
From: Johannes Weiner
Date: Mon Mar 30 2026 - 17:00:53 EST
On Mon, Mar 30, 2026 at 04:41:16PM -0400, Johannes Weiner wrote:
> Hello Mikhail,
>
> On Mon, Mar 30, 2026 at 06:37:01PM +0200, Mikhail Zaslonko wrote:
> > with this series in linux-next (since next-20260324) I see a reproducible panic on s390 in the
> > dump kernel when running NVMe standalone dump (ngdump).
> > This only happens in the 'capture kernel', normal boot of the same kernel works fine.
> >
> > [ 14.350676] Unable to handle kernel pointer dereference in virtual kernel address space
> > [ 14.350682] Failing address: 4000000000000000 TEID: 4000000000000803 ESOP-2 FSI
> > [ 14.350686] Fault in home space mode while using kernel ASCE.
> > [ 14.350689] AS:0000000002798007 R3:000000002d2c4007 S:000000002d2c3001 P:000000000000013d
> > [ 14.350730] Oops: 0038 ilc:3 [#1]SMP
> > [ 14.350735] Modules linked in: dm_service_time zfcp scsi_transport_fc uvdevice diag288_wdt nvme prng aes_s390 nvme_core des_s390 libdes zcrypt_cex4 dm_mirror dm_region_hash dm_log scsi_dh_rdac scsi_dh_emc scsi_dh_alua paes_s390 crypto_engine pkey_cca pkey_ep11 zcrypt rng_core pkey_pckmo pkey dm_multipath autofs4
> > [ 14.350760] CPU: 0 UID: 0 PID: 32 Comm: khugepaged Not tainted 7.0.0-rc5-next-20260324
> > [ 14.350762] Hardware name: IBM 3931 A01 704 (LPAR)
> > [ 14.350764] Krnl PSW : 0704d00180000000 000003ffe0443a82 (__memcg_list_lru_alloc+0x52/0x1d0)
> > [ 14.350774] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> > [ 14.350776] Krnl GPRS: 0000000000000402 00000000000bece0 0000000000000000 000003ffe1c17928
> > [ 14.350778] 00000000001c24ca 0000000000000000 0000000000000000 000003ffe1c17948
> > [ 14.350780] 0000000000000000 00000000000824c0 0000037200098000 4000000000000000
> > [ 14.350782] 0000000000782400 0000000000000001 0000037fe00f39b8 0000037fe00f3918
> > [ 14.350788] Krnl Code: 000003ffe0443a72: a7690000 lghi %r6,0
> > [ 14.350788] 000003ffe0443a76: e380f0a00004 lg %r8,160(%r15)
> > [ 14.350788] *000003ffe0443a7c: e3b080b80004 lg %r11,184(%r8)
> > [ 14.350788] >000003ffe0443a82: e330b9400012 lt %r3,2368(%r11)
> > [ 14.350788] 000003ffe0443a88: a7a40065 brc 10,000003ffe0443b52
> > [ 14.350788] 000003ffe0443a8c: e3b0f0a00004 lg %r11,160(%r15)
> > [ 14.350788] 000003ffe0443a92: ec68006f007c cgij %r6,0,8,000003ffe0443b70
> > [ 14.350788] 000003ffe0443a98: e300b9400014 lgf %r0,2368(%r11)
> > [ 14.350825] Call Trace:
> > [ 14.350826] [<000003ffe0443a82>] __memcg_list_lru_alloc+0x52/0x1d0
> > [ 14.350831] [<000003ffe044529a>] folio_memcg_list_lru_alloc+0xba/0x150
> > [ 14.350834] [<000003ffe04f279a>] alloc_charge_folio+0x18a/0x250
> > [ 14.350839] [<000003ffe04f34dc>] collapse_huge_page+0x8c/0x890
> > [ 14.350841] [<000003ffe04f4222>] collapse_scan_pmd+0x542/0x690
> > [ 14.350844] [<000003ffe04f65b4>] collapse_single_pmd+0x144/0x240
> > [ 14.350847] [<000003ffe04f69ce>] collapse_scan_mm_slot.constprop.0+0x31e/0x480
> > [ 14.350849] [<000003ffe04f6d3c>] khugepaged+0x20c/0x210
> > [ 14.350852] [<000003ffe019b0a8>] kthread+0x148/0x170
> > [ 14.350856] [<000003ffe0119fec>] __ret_from_fork+0x3c/0x240
> > [ 14.350860] [<000003ffe0ffa4b2>] ret_from_fork+0xa/0x30
> > [ 14.350865] Last Breaking-Event-Address:
> > [ 14.350865] [<000003ffe0445294>] folio_memcg_list_lru_alloc+0xb4/0x150
> > [ 14.350870] Kernel panic - not syncing: Fatal exception: panic_on_oops
Can you verify whether the kdump kernel boots with
cgroup_disable=memory?
I think there is an issue with how we call __list_lru_init(). The
existing callsites had their own memcg_kmem_online() guards. But the
THP one does not, so we're creating a memcg-aware list_lru, but the
do-while hierarchy walk in __memcg_list_lru_alloc() runs into a NULL
memcg.
Can you try the below on top of that -next checkout?
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1ccdd45b1d14..7c7024e33653 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -637,7 +637,7 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, struct shrinker *shr
else
lru->shrinker_id = -1;
- if (mem_cgroup_kmem_disabled())
+ if (mem_cgroup_disabled() || mem_cgroup_kmem_disabled())
memcg_aware = false;
#endif