Re: KCSAN: data-race in task_dump_owner / task_dump_owner

From: Marco Elver
Date: Thu Oct 17 2019 - 14:33:13 EST


On Thu, 17 Oct 2019 at 20:17, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
>
> On Thu, Oct 17, 2019 at 02:56:47PM +0200, Marco Elver wrote:
> > Hi,
> >
> > On Thu, 17 Oct 2019 at 14:36, syzbot
> > <syzbot+e392f8008a294fdf8891@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: d724f94f x86, kcsan: Enable KCSAN for x86
> > > git tree: https://github.com/google/ktsan.git kcsan
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=17884db3600000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=c0906aa620713d80
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e392f8008a294fdf8891
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+e392f8008a294fdf8891@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >
> > > ==================================================================
> > > BUG: KCSAN: data-race in task_dump_owner / task_dump_owner
> > >
> > > write to 0xffff8881255bb7fc of 4 bytes by task 7804 on cpu 0:
> > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742
> > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818
> > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841
> > > d_revalidate fs/namei.c:765 [inline]
> > > d_revalidate fs/namei.c:762 [inline]
> > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613
> > > walk_component+0x6d/0xe80 fs/namei.c:1804
> > > link_path_walk.part.0+0x5d3/0xa90 fs/namei.c:2139
> > > link_path_walk fs/namei.c:2070 [inline]
> > > path_openat+0x14f/0x3530 fs/namei.c:3532
> > > do_filp_open+0x11e/0x1b0 fs/namei.c:3563
> > > do_sys_open+0x3b3/0x4f0 fs/open.c:1089
> > > __do_sys_open fs/open.c:1107 [inline]
> > > __se_sys_open fs/open.c:1102 [inline]
> > > __x64_sys_open+0x55/0x70 fs/open.c:1102
> > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
> > > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
> > > write to 0xffff8881255bb7fc of 4 bytes by task 7813 on cpu 1:
> > > task_dump_owner+0xd8/0x260 fs/proc/base.c:1742
> > > pid_update_inode+0x3c/0x70 fs/proc/base.c:1818
> > > pid_revalidate+0x91/0xd0 fs/proc/base.c:1841
> > > d_revalidate fs/namei.c:765 [inline]
> > > d_revalidate fs/namei.c:762 [inline]
> > > lookup_fast+0x7cb/0x7e0 fs/namei.c:1613
> > > walk_component+0x6d/0xe80 fs/namei.c:1804
> > > lookup_last fs/namei.c:2271 [inline]
> > > path_lookupat.isra.0+0x13a/0x5a0 fs/namei.c:2316
> > > filename_lookup+0x145/0x2d0 fs/namei.c:2346
> > > user_path_at_empty+0x4c/0x70 fs/namei.c:2606
> > > user_path_at include/linux/namei.h:60 [inline]
> > > vfs_statx+0xd9/0x190 fs/stat.c:187
> > > vfs_stat include/linux/fs.h:3188 [inline]
> > > __do_sys_newstat+0x51/0xb0 fs/stat.c:341
> > > __se_sys_newstat fs/stat.c:337 [inline]
> > > __x64_sys_newstat+0x3a/0x50 fs/stat.c:337
> > > do_syscall_64+0xcf/0x2f0 arch/x86/entry/common.c:296
> > > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 1 PID: 7813 Comm: ps Not tainted 5.3.0+ #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > ==================================================================
> >
> > My understanding is, that for every access to /proc/<pid>,
> > d_revalidate is called, and /proc-fs implementation simply says that
> > pid_revalidate always revalidates by rewriting uid/gid because "owning
> > task may have performed a setuid(), etc." presumably so every access
> > to a /proc/<pid> entry always has the right uid/gid (in effect
> > updating /proc/<pid> lazily via d_revalidate).
> >
> > Is it possible that one of the tasks above could be preempted after
> > doing its writes to *ruid/*rgid, another thread writing some other
> > values (after setuid / seteuid), and then the preempted thread seeing
> > the other values? Assertion here should never fail:
> > === TASK 1 ===
> > | seteuid(1000);
> > | seteuid(0);
> > | stat("/proc/<pid-of-task-1>", &fstat);
> > | assert(fstat.st_uid == 0);
> > === TASK 2 ===
> > | stat("/proc/<pid-of-task-1>", ...);
>
> Is it the same as
> pid_revalidate() snapshots (uid,gid) correctly
> but writeback is done in any order?

Yes, I think so. Snapshot is done in RCU reader critical section, but
the writes can race with another thread. Is there logic that ensures
this doesn't lead to the observable outcome above?