Re: deadlock in synchronize_srcu() in debugfs?

From: Paul E. McKenney
Date: Fri Mar 24 2017 - 13:46:58 EST


On Fri, Mar 24, 2017 at 10:24:46AM +0100, Johannes Berg wrote:
> Hi,
>
> On Fri, 2017-03-24 at 09:56 +0100, Johannes Berg wrote:
> > On Thu, 2017-03-23 at 16:29 +0100, Johannes Berg wrote:
> > > Isn't it possible for the following to happen?
> > >
> > > CPU1 CPU2
> > >
> > > mutex_lock(&M); // acquires mutex
> > > full_proxy_xyz();
> > > srcu_read_lock(&debugfs_srcu);
> > > real_fops->xyz();
> > > mutex_lock(&M); // waiting for mutex
> > > debugfs_remove(F);
> > > synchronize_srcu(&debugfs_srcu);
>
> > So I'm pretty sure that this can happen. I'm not convinced that it's
> > happening here, but still.
>
> I'm a bit confused, in that SRCU, of course, doesn't wait until all the
> readers are done - that'd be a regular reader/writer lock or something.

Agreed, synchronize_srcu() does not have to wait for new readers
(as a reader/writer lock would), but it -does- have have to wait for
pre-existing readers, like the one shown in your example above.

> However, it does (have to) wait until all the currently active read-
> side sections have terminated, which still leads to a deadlock in the
> example above, I think?

Yes. CPU2 has a pre-existing reader that CPU1's synchronize_srcu()
must wait for. But CPU2's reader cannot end until CPU1 releases
its lock, which it cannot do until after CPU2's reader ends. Thus,
as you say, deadlock.

The rule is that if you are within any kind of RCU read-side critical
section, you cannot directly or indirectly wait for a grace period from
that same RCU flavor.

> In his 2006 LWN article Paul wrote:
>
> The designer of a given subsystem is responsible for: (1) ensuring
> that SRCU read-side sleeping is bounded and (2) limiting the amount
> of memory waiting for synchronize_srcu(). [1]
>
> In the case of debugfs files acquiring locks, (1) can't really be
> guaranteed, especially if those locks can be held while doing
> synchronize_srcu() [via debugfs_remove], so I still think the lockdep
> annotation needs to be changed to at least have some annotation at
> synchronize_srcu() time so we can detect this.

That would be very nice!

There are some challenges, though. This is OK:

CPU1 CPU2
i = srcu_read_lock(&mysrcu); mutex_lock(&my_lock);
mutex_lock(&my_lock); i = srcu_read_lock(&mysrcu);
srcu_read_unlock(&mysrcu, i); mutex_unlock(&my_lock);
mutex_unlock(&my_lock); srcu_read_unlock(&mysrcu, i);

CPU3
synchronize_srcu(&mylock);

This could be a deadlock for reader-writer locking, but not for SRCU.

This is also OK:

CPU1 CPU2
i = srcu_read_lock(&mysrcu); mutex_lock(&my_lock);
mutex_lock(&my_lock); synchronize_srcu(&yoursrcu);
srcu_read_unlock(&mysrcu, i); mutex_unlock(&my_lock);
mutex_unlock(&my_lock);

Here CPU1's read-side critical sections are for mysrcu, which is
independent of CPU2's grace period for yoursrcu.

So you could flag any lockdep cycle that contained a reader and a
synchronous grace period for the same flavor of RCU, where for SRCU the
identity of the srcu_struct structure is part of the flavor.

> Now, I still suspect there's some other bug here in the case that I'm
> seeing, because I don't actually see the "mutex_lock(&M); // waiting"
> piece in the traces. I'll need to run this with some tracing on Monday
> when the test guys are back from the weekend.
>
> I'm also not sure how I can possibly fix this in debugfs in mac80211
> and friends, but that's perhaps a different story. Clearly, this
> debugfs patch is a good thing - the code will likely have had use-
> after-free problems in this situation without it. But flagging the
> potential deadlocks would make it a lot easier to find them.

No argument here!

Thanx, Paul