Re: [PATCH] fs: fix data races on inode->i_flctx

From: Jeff Layton
Date: Mon Sep 21 2015 - 07:44:19 EST


On Mon, 21 Sep 2015 13:38:45 +0200
Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:

> Yes. That processor is Alpha and that's documented in DATA DEPENDENCY
> BARRIERS section of Documentation/memory-barriers.txt.
>
>

Ok, thanks for the explanation. Patch looks fine to me. I'll go ahead
and merge it for v4.4. Let me know though if you think this needs to go
in sooner.

Thanks,
Jeff

>
> On Mon, Sep 21, 2015 at 1:17 PM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> > On Mon, 21 Sep 2015 13:00:04 +0200
> > Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >
> >> On Mon, Sep 21, 2015 at 12:56 PM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> >> > On Mon, 21 Sep 2015 12:53:40 +0200
> >> > Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >> >
> >> >> On Mon, Sep 21, 2015 at 12:50 PM, Jeff Layton <jlayton@xxxxxxxxxxxxxxx> wrote:
> >> >> > On Mon, 21 Sep 2015 09:43:06 +0200
> >> >> > Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >> >> >
> >> >> >> locks_get_lock_context() uses cmpxchg() to install i_flctx.
> >> >> >> cmpxchg() is a release operation which is correct. But it uses
> >> >> >> a plain load to load i_flctx. This is incorrect. Subsequent loads
> >> >> >> from i_flctx can hoist above the load of i_flctx pointer itself
> >> >> >> and observe uninitialized garbage there. This in turn can lead
> >> >> >> to corruption of ctx->flc_lock and other members.
> >> >> >>
> >> >> >
> >> >> > I don't get this bit. The i_flctx field is initialized to zero when the
> >> >> > inode is allocated, and we only assign it with cmpxchg(). How would you
> >> >> > ever see uninitialized garbage in there? It should either be NULL or a
> >> >> > valid pointer, no?
> >> >>
> >> >> This is not about i_flctx pointer, this is about i_flctx object
> >> >> contents (pointee).
> >> >> Readers can see a non-NULL pointer, but read garbage from *i_flctx.
> >> >>
> >> >
> >> > Again, I don't get it though. The cmpxchg happens after we've
> >> > initialized the structure. Given that there are implicit full memory
> >> > barriers before and after the cmpxchg(), how do you end up with another
> >> > task seeing the pointer before it was ever initialized?
> >> >
> >> > The store should only happen after the initialization has occurred, and
> >> > the loads in the other tasks shouldn't be able to see the results of
> >> > that store until then. No?
> >>
> >> If a reader looks at things in the opposite order, then it does not
> >> matter how you store them. Reader still can observe them in the wrong
> >> order.
> >> Memory barriers is always a game of two. Writer needs to do stores in
> >> the correct order and reader must do reads in the correct order. A
> >> single memory barrier cannot possible make any sense.
> >>
> >
> > I get that, but...isn't there a data dependency there? In order for the
> > reader to see bogus fields inside of i_flctx, it needs to be able to
> > see a valid pointer in i_flctx...and in order for that to occur the
> > store has to have occurred.
> >
> > I guess the concern is that the reader CPU could stumble across some
> > memory that is eventually going to part of i_flctx prior to loading
> > that pointer, and assume that its contents are still valid after the
> > pointer store has occurred? Is that the basic scenario?
> >
> >>
> >> >> > If that'st the case, then most of the places where you're adding
> >> >> > smp_load_acquire are places that can tolerate seeing NULL there. e.g.
> >> >> > you have racing LOCK_EX/LOCK_UN flock() calls in different tasks or
> >> >> > something.
> >> >> >
> >> >> >> Documentation/memory-barriers.txt explicitly requires to use
> >> >> >> a barrier in such context:
> >> >> >> "A load-load control dependency requires a full read memory barrier".
> >> >> >>
> >> >> >
> >> >> > The same file also says:
> >> >> >
> >> >> > "Any atomic operation that modifies some state in memory and returns information
> >> >> > about the state (old or new) implies an SMP-conditional general memory barrier
> >> >> > (smp_mb()) on each side of the actual operation (with the exception of
> >> >> > explicit lock operations, described later). These include:
> >> >> >
> >> >> > ...
> >> >> > /* when succeeds */
> >> >> > cmpxchg();
> >> >> > "
> >> >> >
> >> >> > If there is an implied smp_mb() before and after the cmpxchg(), how
> >> >> > could the CPU hoist anything before that point?
> >> >> >
> >> >> > Note that I'm not saying that you're wrong, I just want to make sure I
> >> >> > fully understand the problem before we go trying to fix it.
> >> >> >
> >> >> >> Use smp_load_acquire() in locks_get_lock_context() and in bunch
> >> >> >> of other functions that can proceed concurrently with
> >> >> >> locks_get_lock_context().
> >> >> >>
> >> >> >> The data race was found with KernelThreadSanitizer (KTSAN).
> >> >> >>
> >> >> >> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> >> >> >> ---
> >> >> >> For the record, here is the KTSAN report:
> >> >> >>
> >> >> >> ThreadSanitizer: data-race in _raw_spin_lock
> >> >> >>
> >> >> >> Read at 0xffff8800bae50da0 of size 1 by thread 3060 on CPU 2:
> >> >> >> [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
> >> >> >> [<ffffffff81ee37d0>] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
> >> >> >> [< inline >] spin_lock include/linux/spinlock.h:312
> >> >> >> [<ffffffff812e1187>] flock_lock_inode+0xb7/0x440 fs/locks.c:895
> >> >> >> [<ffffffff812e16e6>] flock_lock_inode_wait+0x46/0x110 fs/locks.c:1871
> >> >> >> [<ffffffff812e2814>] SyS_flock+0x224/0x23
> >> >> >>
> >> >> >> Previous write at 0xffff8800bae50da0 of size 8 by thread 3063 on CPU 8:
> >> >> >> [<ffffffff81248628>] kmem_cache_alloc+0xd8/0x2f0 mm/slab.c:3420
> >> >> >> [<ffffffff812ddba0>] locks_get_lock_context+0x60/0x140 fs/locks.c:213
> >> >> >> [<ffffffff812e112a>] flock_lock_inode+0x5a/0x440 fs/locks.c:882
> >> >> >> [<ffffffff812e16e6>] flock_lock_inode_wait+0x46/0x110 fs/locks.c:1871
> >> >> >> [< inline >] flock_lock_file_wait include/linux/fs.h:1219
> >> >> >> [< inline >] SYSC_flock fs/locks.c:1941
> >> >> >> [<ffffffff812e2814>] SyS_flock+0x224/0x230 fs/locks.c:1904
> >> >> >> [<ffffffff81ee3e11>] entry_SYSCALL_64_fastpath+0x31/0x95 arch/x86/entry/entry_64.S:188
> >> >> >> ---
> >> >> >> fs/locks.c | 63 +++++++++++++++++++++++++++++++++++---------------------------
> >> >> >> 1 file changed, 36 insertions(+), 27 deletions(-)
> >> >> >>
> >> >> >> diff --git a/fs/locks.c b/fs/locks.c
> >> >> >> index 2a54c80..316e474 100644
> >> >> >> --- a/fs/locks.c
> >> >> >> +++ b/fs/locks.c
> >> >> >> @@ -205,28 +205,32 @@ static struct kmem_cache *filelock_cache __read_mostly;
> >> >> >> static struct file_lock_context *
> >> >> >> locks_get_lock_context(struct inode *inode, int type)
> >> >> >> {
> >> >> >> - struct file_lock_context *new;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >>
> >> >> >> - if (likely(inode->i_flctx) || type == F_UNLCK)
> >> >> >> + /* paired with cmpxchg() below */
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> + if (likely(ctx) || type == F_UNLCK)
> >> >> >> goto out;
> >> >> >>
> >> >> >> - new = kmem_cache_alloc(flctx_cache, GFP_KERNEL);
> >> >> >> - if (!new)
> >> >> >> + ctx = kmem_cache_alloc(flctx_cache, GFP_KERNEL);
> >> >> >> + if (!ctx)
> >> >> >> goto out;
> >> >> >>
> >> >> >> - spin_lock_init(&new->flc_lock);
> >> >> >> - INIT_LIST_HEAD(&new->flc_flock);
> >> >> >> - INIT_LIST_HEAD(&new->flc_posix);
> >> >> >> - INIT_LIST_HEAD(&new->flc_lease);
> >> >> >> + spin_lock_init(&ctx->flc_lock);
> >> >> >> + INIT_LIST_HEAD(&ctx->flc_flock);
> >> >> >> + INIT_LIST_HEAD(&ctx->flc_posix);
> >> >> >> + INIT_LIST_HEAD(&ctx->flc_lease);
> >> >> >>
> >> >> >> /*
> >> >> >> * Assign the pointer if it's not already assigned. If it is, then
> >> >> >> * free the context we just allocated.
> >> >> >> */
> >> >> >> - if (cmpxchg(&inode->i_flctx, NULL, new))
> >> >> >> - kmem_cache_free(flctx_cache, new);
> >> >> >> + if (cmpxchg(&inode->i_flctx, NULL, ctx)) {
> >> >> >> + kmem_cache_free(flctx_cache, ctx);
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> + }
> >> >> >> out:
> >> >> >> - return inode->i_flctx;
> >> >> >> + return ctx;
> >> >> >> }
> >> >> >>
> >> >> >> void
> >> >> >> @@ -762,7 +766,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> >> >> >> struct file_lock_context *ctx;
> >> >> >> struct inode *inode = file_inode(filp);
> >> >> >>
> >> >> >> - ctx = inode->i_flctx;
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (!ctx || list_empty_careful(&ctx->flc_posix)) {
> >> >> >> fl->fl_type = F_UNLCK;
> >> >> >> return;
> >> >> >> @@ -1203,7 +1207,7 @@ int locks_mandatory_locked(struct file *file)
> >> >> >> struct file_lock_context *ctx;
> >> >> >> struct file_lock *fl;
> >> >> >>
> >> >> >> - ctx = inode->i_flctx;
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (!ctx || list_empty_careful(&ctx->flc_posix))
> >> >> >> return 0;
> >> >> >>
> >> >> >> @@ -1388,7 +1392,7 @@ any_leases_conflict(struct inode *inode, struct file_lock *breaker)
> >> >> >> int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >> >> >> {
> >> >> >> int error = 0;
> >> >> >> - struct file_lock_context *ctx = inode->i_flctx;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >> struct file_lock *new_fl, *fl, *tmp;
> >> >> >> unsigned long break_time;
> >> >> >> int want_write = (mode & O_ACCMODE) != O_RDONLY;
> >> >> >> @@ -1400,6 +1404,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >> >> >> new_fl->fl_flags = type;
> >> >> >>
> >> >> >> /* typically we will check that ctx is non-NULL before calling */
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (!ctx) {
> >> >> >> WARN_ON_ONCE(1);
> >> >> >> return error;
> >> >> >> @@ -1494,9 +1499,10 @@ EXPORT_SYMBOL(__break_lease);
> >> >> >> void lease_get_mtime(struct inode *inode, struct timespec *time)
> >> >> >> {
> >> >> >> bool has_lease = false;
> >> >> >> - struct file_lock_context *ctx = inode->i_flctx;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >> struct file_lock *fl;
> >> >> >>
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (ctx && !list_empty_careful(&ctx->flc_lease)) {
> >> >> >> spin_lock(&ctx->flc_lock);
> >> >> >> if (!list_empty(&ctx->flc_lease)) {
> >> >> >> @@ -1543,10 +1549,11 @@ int fcntl_getlease(struct file *filp)
> >> >> >> {
> >> >> >> struct file_lock *fl;
> >> >> >> struct inode *inode = file_inode(filp);
> >> >> >> - struct file_lock_context *ctx = inode->i_flctx;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >> int type = F_UNLCK;
> >> >> >> LIST_HEAD(dispose);
> >> >> >>
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (ctx && !list_empty_careful(&ctx->flc_lease)) {
> >> >> >> spin_lock(&ctx->flc_lock);
> >> >> >> time_out_leases(file_inode(filp), &dispose);
> >> >> >> @@ -1713,9 +1720,10 @@ static int generic_delete_lease(struct file *filp, void *owner)
> >> >> >> struct file_lock *fl, *victim = NULL;
> >> >> >> struct dentry *dentry = filp->f_path.dentry;
> >> >> >> struct inode *inode = dentry->d_inode;
> >> >> >> - struct file_lock_context *ctx = inode->i_flctx;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >> LIST_HEAD(dispose);
> >> >> >>
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (!ctx) {
> >> >> >> trace_generic_delete_lease(inode, NULL);
> >> >> >> return error;
> >> >> >> @@ -2359,13 +2367,14 @@ out:
> >> >> >> void locks_remove_posix(struct file *filp, fl_owner_t owner)
> >> >> >> {
> >> >> >> struct file_lock lock;
> >> >> >> - struct file_lock_context *ctx = file_inode(filp)->i_flctx;
> >> >> >> + struct file_lock_context *ctx;
> >> >> >>
> >> >> >> /*
> >> >> >> * If there are no locks held on this file, we don't need to call
> >> >> >> * posix_lock_file(). Another process could be setting a lock on this
> >> >> >> * file at the same time, but we wouldn't remove that lock anyway.
> >> >> >> */
> >> >> >> + ctx = smp_load_acquire(&file_inode(filp)->i_flctx);
> >> >> >> if (!ctx || list_empty(&ctx->flc_posix))
> >> >> >> return;
> >> >> >>
> >> >> >> @@ -2389,7 +2398,7 @@ EXPORT_SYMBOL(locks_remove_posix);
> >> >> >>
> >> >> >> /* The i_flctx must be valid when calling into here */
> >> >> >> static void
> >> >> >> -locks_remove_flock(struct file *filp)
> >> >> >> +locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
> >> >> >> {
> >> >> >> struct file_lock fl = {
> >> >> >> .fl_owner = filp,
> >> >> >> @@ -2400,7 +2409,6 @@ locks_remove_flock(struct file *filp)
> >> >> >> .fl_end = OFFSET_MAX,
> >> >> >> };
> >> >> >> struct inode *inode = file_inode(filp);
> >> >> >> - struct file_lock_context *flctx = inode->i_flctx;
> >> >> >>
> >> >> >> if (list_empty(&flctx->flc_flock))
> >> >> >> return;
> >> >> >> @@ -2416,10 +2424,8 @@ locks_remove_flock(struct file *filp)
> >> >> >>
> >> >> >> /* The i_flctx must be valid when calling into here */
> >> >> >> static void
> >> >> >> -locks_remove_lease(struct file *filp)
> >> >> >> +locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
> >> >> >> {
> >> >> >> - struct inode *inode = file_inode(filp);
> >> >> >> - struct file_lock_context *ctx = inode->i_flctx;
> >> >> >> struct file_lock *fl, *tmp;
> >> >> >> LIST_HEAD(dispose);
> >> >> >>
> >> >> >> @@ -2439,17 +2445,20 @@ locks_remove_lease(struct file *filp)
> >> >> >> */
> >> >> >> void locks_remove_file(struct file *filp)
> >> >> >> {
> >> >> >> - if (!file_inode(filp)->i_flctx)
> >> >> >> + struct file_lock_context *ctx;
> >> >> >> +
> >> >> >> + ctx = smp_load_acquire(&file_inode(filp)->i_flctx);
> >> >> >> + if (!ctx)
> >> >> >> return;
> >> >> >>
> >> >> >> /* remove any OFD locks */
> >> >> >> locks_remove_posix(filp, filp);
> >> >> >>
> >> >> >> /* remove flock locks */
> >> >> >> - locks_remove_flock(filp);
> >> >> >> + locks_remove_flock(filp, ctx);
> >> >> >>
> >> >> >> /* remove any leases */
> >> >> >> - locks_remove_lease(filp);
> >> >> >> + locks_remove_lease(filp, ctx);
> >> >> >> }
> >> >> >>
> >> >> >> /**
> >> >> >> @@ -2616,7 +2625,7 @@ void show_fd_locks(struct seq_file *f,
> >> >> >> struct file_lock_context *ctx;
> >> >> >> int id = 0;
> >> >> >>
> >> >> >> - ctx = inode->i_flctx;
> >> >> >> + ctx = smp_load_acquire(&inode->i_flctx);
> >> >> >> if (!ctx)
> >> >> >> return;
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> >>
> >>
> >>
> >
> >
> > --
> > Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
>
>
>


--
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/