Re: [syzbot] [mm?] [bcachefs?] WARNING in lock_list_lru_of_memcg

From: Alan Huang
Date: Tue Feb 18 2025 - 07:17:52 EST


On Feb 18, 2025, at 19:40, Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@xxxxxxxxx> wrote:
>>
>> On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@xxxxxxxxx> wrote:
>>>
>>> On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>>>>
>>>> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>
>>>>> Thanks. I doubt if bcachefs is implicated in this?
>>>>>
>>>>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g..
>>>>>> git tree: upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000
>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff
>>>>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz
>>>>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+38a0cbd267eff2d286ff@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96
>>>>>
>>>>> VM_WARN_ON(!css_is_dying(&memcg->css));
>>>>
>>>> I'm checking this, when last time this was triggered, it was caused by
>>>> a list_lru user did not initialize the memcg list_lru properly before
>>>> list_lru reclaim started, and fixed by:
>>>> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@xxxxxxxxx/T/
>>>>
>>>> This shouldn't be a big issue, maybe there are leaks that will be
>>>> fixed upon reparenting, and this new added sanity check might be too
>>>> lenient, I'm not 100% sure though.
>>>>
>>>> Unfortunately I couldn't reproduce the issue locally with the
>>>> reproducer yet. will keep the test running and see if it can hit this
>>>> WARN_ON.
>>>
>>> So far I am still unable to trigger this VM_WARN_ON using the
>>> reproducer, and I'm seeing many other random crashes.
>>>
>>> But after I changed the .config a bit adding more debug configs
>>> (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up
>>> and will be triggered immediately after I start the test:
>>>
>>> [ T1242] BUG: unable to handle page fault for address: ffff888054c60000
>>> [ T1242] #PF: supervisor read access in kernel mode
>>> [ T1242] #PF: error_code(0x0000) - not-present page
>>> [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE
>>> 800fffffab39f060
>>> [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
>>> [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted
>>> 6.14.0-rc2-00185-g128c8f96eb86 #2
>>> [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS
>>> 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014
>>> [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T6058] bcachefs (loop2): empty btree root xattrs
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Call Trace:
>>> [ T1242] <TASK>
>>> [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0
>>> [ T1242] btree_node_read_work+0x54d/0xdc0
>>> [ T1242] process_scheduled_works+0xaf8/0x17f0
>>> [ T1242] worker_thread+0x89d/0xd60
>>> [ T1242] kthread+0x722/0x890
>>> [ T1242] ret_from_fork+0x4e/0x80
>>> [ T1242] ret_from_fork_asm+0x1a/0x30
>>> [ T1242] </TASK>
>>> [ T1242] Modules linked in:
>>> [ T1242] ---[ end trace 0000000000000000 ]---
>>> [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0
>>> [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6
>>> 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89
>>> ff <f3> 48 a5 48 8b bc 24 c8 00 00 08
>>> [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206
>>> [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31
>>> [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90
>>> [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035
>>> [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e
>>> [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0
>>> [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000)
>>> knlGS:0000000000000000
>>> [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0
>>> [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [ T1242] Kernel panic - not syncing: Fatal exception
>>> [ T1242] Kernel Offset: disabled
>>> [ T1242] Rebooting in 86400 seconds..
>>>
>>> It's caused by the memmove_u64s_down in validate_bset_keys of
>>> fs/bcachefs/btree_io.c:
>>> -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>>
>>
>> Might need this.
>>
>> diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
>> index e71b278672b6..fb53174cb735 100644
>> --- a/fs/bcachefs/btree_io.c
>> +++ b/fs/bcachefs/btree_io.c
>> @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
>> }
>> got_good_key:
>> le16_add_cpu(&i->u64s, -next_good_key);
>> - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
>> + memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k));
>> set_btree_node_need_rewrite(b);
>> }
>> fsck_err:
>>
>
> Thanks, but this didn't fix everything. I think the problem is more
> complex, syzbot seems to be trying to mount damaged bcachefs (on
> purpose I think), so the vstruct_end(i) is already returning an offset
> that is out of border.

Could you try this (I need to go out now):

diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e71b278672b6..80a0094be356 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b,
}
got_good_key:
le16_add_cpu(&i->u64s, -next_good_key);
- memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k);
+ memmove_u64s_down(k, (u64 *) k + next_good_key, (u64 *) vstruct_end(i) - (u64 *) k);
set_btree_node_need_rewrite(b);
}
fsck_err:

>
> I retriggered it and print some more debug info: i->_data is
> ffff88806d5c00a0, i->u64s is 60928, and the faulting address is
> ffff88806d600000.