Re: [syzbot] [mm?] WARNING in zswap_folio_swapin

From: Chengming Zhou
Date: Sat Feb 03 2024 - 21:59:48 EST


On 2024/2/4 09:28, Nhat Pham wrote:
> On Sat, Feb 3, 2024 at 12:37 PM syzbot
> <syzbot+17a611d10af7d18a7092@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 861c0981648f Merge tag 'jfs-6.8-rc3' of github.com:kleikam..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=174537bbe80000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=b168fa511db3ca08
>> dashboard link: https://syzkaller.appspot.com/bug?extid=17a611d10af7d18a7092
>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> userspace arch: i386
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-861c0981.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/b2b204c7b4a0/vmlinux-861c0981.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/170ec316e557/bzImage-861c0981.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+17a611d10af7d18a7092@xxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> kcov_ioctl+0x4f/0x720 kernel/kcov.c:704
>> __do_compat_sys_ioctl+0x2bf/0x330 fs/ioctl.c:971
>> do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
>> __do_fast_syscall_32+0x79/0x110 arch/x86/entry/common.c:321
>> page has been migrated, last migrate reason: compaction
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 5104 at include/linux/memcontrol.h:775 folio_lruvec include/linux/memcontrol.h:775 [inline]
>> WARNING: CPU: 2 PID: 5104 at include/linux/memcontrol.h:775 zswap_folio_swapin+0x47d/0x5a0 mm/zswap.c:381
>> Modules linked in:
>> CPU: 2 PID: 5104 Comm: syz-fuzzer Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> RIP: 0010:folio_lruvec include/linux/memcontrol.h:775 [inline]
>
> Hmm looks like it's this line:
> VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled(), folio);
>
> Looks like memcg was cleared from the folio. Haven't looked too
> closely yet, but this (and the "page has been migrated" line above)
> suggests maybe there is some migration business going on -
> mem_cgroup_migrate() clears the old folio's memcg_data (via
> old->memcg_data = 0).

Yeah, I think it's this case.

>
> Here's my theory (which could be wrong - someone please fact-check
> me): swap_read_folio(), which precedes zswap_folio_swapin(), unlocks

And another case is !page_allocated, the returned folio is unlocked, right?

> the folio. Could this be sufficient to allow for migration? If this is

IMHO, folio locked is sufficient to avoid concurrent memcg migration.

> the case, all we need to do is move this to above swap_read_folio(),
> while the folio is still locked. __read_swap_cache_async() already
> charges the folio to an memcg, so no need to wait till after
> swap_read_page() anyway.

Should we call zswap_folio_swapin() in the !page_allocated case?

Thanks.