Re: [syzbot] [mm?] WARNING in vms_complete_munmap_vmas

From: Lorenzo Stoakes
Date: Fri Oct 11 2024 - 11:26:05 EST


On Tue, Oct 08, 2024 at 01:25:31PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 80cb3fb61135 Merge branch 'for-next/core', remote-tracking..

I can't find this commit hash any more :) presumably rebased.

Also obviously not a hugely useful bisect. I wonder if there was some other
problematic patch in this rebase...

> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci

Bit strange that the head commit references a different branch than
this... presumably they were at the same commit or?

> console output: https://syzkaller.appspot.com/x/log.txt?x=137aa7d0580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=b9f31443a725c681
> dashboard link: https://syzkaller.appspot.com/bug?extid=38c3a8b50658644abaca
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14f94327980000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=177aa7d0580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/f883f65fbfeb/disk-80cb3fb6.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/d950aa1c78a2/vmlinux-80cb3fb6.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/972c4d758a0b/Image-80cb3fb6.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+38c3a8b50658644abaca@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 6413 at mm/vma.c:1147 vms_complete_munmap_vmas+0x6f4/0x840 mm/vma.c:1147

This is

VM_WARN_ON(vms->stack_vm > mm->stack_vm);

But I cannot see what possibly caused this, we've not changed how this is
done, only that we check that we don't underflow this counter now.

Very strange for this to fail only now and only here and for this to have
not reproduced except only on this now-deleted commit, and once (I see on
the dashboard it was tried several times before).

I wonder if another patch has impacted this somehow...

> Modules linked in:
> CPU: 0 UID: 0 PID: 6413 Comm: syz-executor308 Not tainted 6.12.0-rc1-syzkaller-g80cb3fb61135 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
> pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : vms_complete_munmap_vmas+0x6f4/0x840 mm/vma.c:1147
> lr : vms_complete_munmap_vmas+0x6f4/0x840 mm/vma.c:1147
> sp : ffff80009bc27550
> x29: ffff80009bc275a0 x28: ffff0000d695f800 x27: 0000000000000c00
> x26: ffff0000d695fa00 x25: ffff80009bc277d0 x24: ffff0000d695f9f8
> x23: ffff80009bc277c8 x22: 0000000000000021 x21: 00000000000010dd
> x20: 1ffff00013784ef6 x19: dfff800000000000 x18: ffff80009bc26b60
> x17: 000000000000d6db x16: ffff80008b3bde40 x15: 0000000000000010
> x14: 1ffff00013784e84 x13: 0000000000000000 x12: 0000000000000000
> x11: ffff700013784e94 x10: 0000000000ff0100 x9 : 0000000000000000
> x8 : ffff0000c2c88000 x7 : 0000000000000000 x6 : 000000000000003f
> x5 : 0000000000000040 x4 : ffffffffffffffe0 x3 : 0000000000000020
> x2 : 0000000000000000 x1 : 0000000000000021 x0 : 0000000000000c00
> Call trace:
> vms_complete_munmap_vmas+0x6f4/0x840 mm/vma.c:1147
> mmap_region+0xc68/0x1e28 mm/mmap.c:1533

There are two possible mmap() calls that trigger this, both hugetlb and
overwriting existing mappings.

They have some uffd logic between them however.

> do_mmap+0x7e0/0xe00 mm/mmap.c:496
> vm_mmap_pgoff+0x1a0/0x38c mm/util.c:588
> ksys_mmap_pgoff+0x3f0/0x5c8 mm/mmap.c:542
> __do_sys_mmap arch/arm64/kernel/sys.c:28 [inline]
> __se_sys_mmap arch/arm64/kernel/sys.c:21 [inline]
> __arm64_sys_mmap+0xf8/0x110 arch/arm64/kernel/sys.c:21
> __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
> invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
> el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
> do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
> el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:732
> el0t_64_sync_handler+0x84/0x108 arch/arm64/kernel/entry-common.c:750
> el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> irq event stamp: 8790
> hardirqs last enabled at (8789): [<ffff80008047a578>] se
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

Overall I wonder if this is a genuine issue or an interaction with some
other change. We're pretty stuck when the repro doesn't repro and the
referenced commit doesn't exist anywhere.

If this is real, I suspect it must be arm64-specific, but at the same time
given the fact things should be write-locked when these counters are
updated, I just cannot see how it can be that we are able to subtract more
on the stack counter than was previously there.

It's suspicious that in [0] we can see that arm64/mm: Implement
map_shadow_stack() is there, and we account shadow stack here.

I wonder if there is a bug in accounting pages in that code?

[0]:https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=80cb3fb6113554f316c79901354b2a3c81479bf5

Will see if I can get an arm64 setup going here.