Re: [syzbot] [mm?] WARNING in lock_list_lru_of_memcg (2)
From: Kairui Song
Date: Thu Nov 06 2025 - 00:58:40 EST
On Thu, Nov 6, 2025 at 10:58 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> +Kairui
Thanks for the Cc.
>
> On Wed, Nov 05, 2025 at 10:38:35AM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: ba36dd5ee6fd Merge tag 'bpf-fixes' of git://git.kernel.org..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16515704580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=e46b8a1c645465a9
> > dashboard link: https://syzkaller.appspot.com/bug?extid=c5b060ce82921a2fd500
> > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/62471ef815ed/disk-ba36dd5e.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/e7a72af6e621/vmlinux-ba36dd5e.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/352eec7dbce0/bzImage-ba36dd5e.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+c5b060ce82921a2fd500@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 13908 at mm/list_lru.c:100 lock_list_lru_of_memcg+0x30c/0x4c0 mm/list_lru.c:100
>
> This is VM_WARN_ON(!css_is_dying(&memcg->css)) in
> lock_list_lru_of_memcg(). It is unexpected as it can only happen if
> (1) list_lru_from_memcg_idx() returns NULL or (2) lock_list_lru()
> find l->nr_items is LONG_MIN which is set after CSS_DYING is set.
>
> I don't see how (2) can happen. For (1) to happen, somehow someone has
> deleted the given alive memcg's list_lru_memcg from shadow_nodes
> list_lru. Not sure how that can happen without some memory corruption or
> unsafe updates to shadow_nodes.
Last time I saw this was due to memory corruption from other components:
https://lore.kernel.org/linux-mm/CAMgjq7Dxv4JwebBtR18_9TpNX_7ej5HXEN1s1sitB+H+4rCE-Q@xxxxxxxxxxxxxx/
Another time was due to an allocation of shadow node missing mapping_set_update:
https://lore.kernel.org/linux-mm/20241222122936.67501-1-ryncsn@xxxxxxxxx/
>
> I think we need to wait for syzbot to generate a reproducer to debug
> further.
Agree, this part has been very stable for a year, hard to tell if there is
any other allocation missing the xas_set_lru callback or something
else is wrong, worst thing could happen now is some minor memory
accounting leak.
I'll have a look from the code side when I have time.