Re: [syzbot] [mm?] KCSAN: data-race in exec_mmap / vms_clear_ptes

From: Liam R. Howlett
Date: Fri Oct 11 2024 - 11:12:34 EST


* Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> [240930 13:47]:
> On Mon, Sep 30, 2024 at 12:39:24AM GMT, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: e7ed34365879 Merge tag 'mailbox-v6.12' of git://git.kernel..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15238ea9980000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=8bfe37bd3f5983d6
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d207c41e97001109b49d
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/c86c1297298e/disk-e7ed3436.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/8313c0846b3b/vmlinux-e7ed3436.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/8af10d85db09/bzImage-e7ed3436.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d207c41e97001109b49d@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ==================================================================
> > BUG: KCSAN: data-race in exec_mmap / vms_clear_ptes
> >
> > write to 0xffff888102fbaae8 of 8 bytes by task 3004 on cpu 1:
> > update_hiwater_rss include/linux/mm.h:2655 [inline]
>
> OK so this sets mm_struct->hiwater_rss on munmap() via update_hiwater_rss() in
> vms_clear_ptes()...
>
> > vms_clear_ptes+0x1a7/0x300 mm/vma.c:1088
> > vms_complete_munmap_vmas+0x170/0x480 mm/vma.c:1140
> > do_vmi_align_munmap+0x349/0x390 mm/vma.c:1349
> > do_vmi_munmap+0x1eb/0x230 mm/vma.c:1397
> > __vm_munmap+0xfd/0x220 mm/mmap.c:1600
> > __do_sys_munmap mm/mmap.c:1617 [inline]
> > __se_sys_munmap mm/mmap.c:1614 [inline]
> > __x64_sys_munmap+0x36/0x40 mm/mmap.c:1614
> > x64_sys_call+0xd32/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:12
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >
>
> > read to 0xffff888102fbaae8 of 8 bytes by task 5494 on cpu 0:
> > get_mm_hiwater_rss include/linux/mm.h:2642 [inline]
>
> ...And a racing execve() is trying to read it.
>
> This is a bit of a tricky race, as we downgrade the write lock to a read lock in
> vms_complete_munmap_vmas(), which allows exec_mmap() to proceed (as it's under a
> read lock), and update_hiwater_rss() is called against vms->vma->vm_mm which is
> set to the old_mm and...
>
> I wonder if we should just be referencing current->mm in vms_clear_ptes() which
> would avoid this as will be the new mm that has been execve'd in (and presumably
> do nothing)?
>
> I don't think in practice this should have too egregious an impact as the
> process is being execve()'d anyway, so I think we can wait for Liam to return
> from leave to give his input.
>
> On your return - Liam what do you think? :)

I think this always existed so I'm not sure how it's showing up now.

Before my change, the counter was updated in unmap_region(), which
happened after the potential mmap lock downgrade to read.

Perhaps the change altered the timing enough to cause this to be
detected, or saving a pointer in a struct is detected easier than
passing a pointer in an argument list.

If we change it to use current->mm, then we will be racing with the
switching of the current mm and the use of the current mm to update the
counter. We may update the new or old mm counter, depending on what
side the update lands when finding current.

I don't think it really matters and I'm happy to mark it as a race.