Re: [PATCH] xfs: fix data races in inode->i_*time

From: Darrick J. Wong
Date: Tue Feb 25 2020 - 15:28:38 EST


On Tue, Feb 25, 2020 at 03:09:45PM -0500, Qian Cai wrote:
> inode->i_*time could be accessed concurrently. The plain reads in
> xfs_vn_getattr() is lockless which result in data races. To avoid bad
> compiler optimizations like load tearing, adding pairs of
> READ|WRITE_ONCE(). While at it, also take care of xfs_setattr_time()
> which presumably could run concurrently with xfs_vn_getattr() as well.
> The data races were reported by KCSAN,
>
> write to 0xffff9275a1920ad8 of 16 bytes by task 47311 on cpu 46:
> xfs_vn_update_time+0x1b0/0x400 [xfs]
> xfs_vn_update_time at fs/xfs/xfs_iops.c:1122
> update_time+0x57/0x80
> file_update_time+0x143/0x1f0
> __xfs_filemap_fault+0x1be/0x3d0 [xfs]
> xfs_filemap_page_mkwrite+0x25/0x40 [xfs]
> do_page_mkwrite+0xf7/0x250
> do_fault+0x679/0x920
> __handle_mm_fault+0xc9f/0xd40
> handle_mm_fault+0xfc/0x2f0
> do_page_fault+0x263/0x6f9
> page_fault+0x34/0x40
>
> 4 locks held by doio/47311:
> #0: ffff9275e7d70808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9
> #1: ffff9274864394d8 (sb_pagefaults){.+.+}, at: __xfs_filemap_fault+0x19b/0x3d0 [xfs]
> #2: ffff9274864395b8 (sb_internal){.+.+}, at: xfs_trans_alloc+0x2af/0x3c0 [xfs]
> #3: ffff9275a1920920 (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0x116/0x2c0 [xfs]
> irq event stamp: 42649
> hardirqs last enabled at (42649): [<ffffffffb22dcdb3>] _raw_spin_unlock_irqrestore+0x53/0x60
> hardirqs last disabled at (42648): [<ffffffffb22dcad1>] _raw_spin_lock_irqsave+0x21/0x60
> softirqs last enabled at (42306): [<ffffffffb260034c>] __do_softirq+0x34c/0x57c
> softirqs last disabled at (42299): [<ffffffffb18c6762>] irq_exit+0xa2/0xc0
>
> read to 0xffff9275a1920ad8 of 16 bytes by task 47312 on cpu 40:
> xfs_vn_getattr+0x20c/0x6a0 [xfs]
> xfs_vn_getattr at fs/xfs/xfs_iops.c:551
> vfs_getattr_nosec+0x11a/0x170
> vfs_statx_fd+0x54/0x90
> __do_sys_newfstat+0x40/0x90
> __x64_sys_newfstat+0x3a/0x50
> do_syscall_64+0x91/0xb05
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> no locks held by doio/47312.
> irq event stamp: 43883
> hardirqs last enabled at (43883): [<ffffffffb1805119>] do_syscall_64+0x39/0xb05
> hardirqs last disabled at (43882): [<ffffffffb1803ede>] trace_hardirqs_off_thunk+0x1a/0x1c
> softirqs last enabled at (43844): [<ffffffffb260034c>] __do_softirq+0x34c/0x57c
> softirqs last disabled at (43141): [<ffffffffb18c6762>] irq_exit+0xa2/0xc0
>
> Signed-off-by: Qian Cai <cai@xxxxxx>
> ---
> fs/xfs/xfs_iops.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 81f2f93caec0..2d5ca13ee9da 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -547,9 +547,9 @@
> stat->uid = inode->i_uid;
> stat->gid = inode->i_gid;
> stat->ino = ip->i_ino;
> - stat->atime = inode->i_atime;
> - stat->mtime = inode->i_mtime;
> - stat->ctime = inode->i_ctime;
> + stat->atime = READ_ONCE(inode->i_atime);
> + stat->mtime = READ_ONCE(inode->i_mtime);
> + stat->ctime = READ_ONCE(inode->i_ctime);

Seeing as one is supposed to take ILOCK_SHARED before reading inode core
information, why don't we do that here? Is there some huge performance
benefit to be realized from READ_ONCE vs. waiting for the lock that
protects all the writes from each other?

--D

> stat->blocks =
> XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
>
> @@ -614,11 +614,11 @@
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>
> if (iattr->ia_valid & ATTR_ATIME)
> - inode->i_atime = iattr->ia_atime;
> + WRITE_ONCE(inode->i_atime, iattr->ia_atime);
> if (iattr->ia_valid & ATTR_CTIME)
> - inode->i_ctime = iattr->ia_ctime;
> + WRITE_ONCE(inode->i_ctime, iattr->ia_ctime);
> if (iattr->ia_valid & ATTR_MTIME)
> - inode->i_mtime = iattr->ia_mtime;
> + WRITE_ONCE(inode->i_mtime, iattr->ia_mtime);
> }
>
> static int
> @@ -1117,11 +1117,11 @@
>
> xfs_ilock(ip, XFS_ILOCK_EXCL);
> if (flags & S_CTIME)
> - inode->i_ctime = *now;
> + WRITE_ONCE(inode->i_ctime, *now);
> if (flags & S_MTIME)
> - inode->i_mtime = *now;
> + WRITE_ONCE(inode->i_mtime, *now);
> if (flags & S_ATIME)
> - inode->i_atime = *now;
> + WRITE_ONCE(inode->i_atime, *now);
>
> xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> xfs_trans_log_inode(tp, ip, log_flags);
> --
> 1.8.3.1
>