Re: [RFC v2 0/6] Track RCU dereferences in RCU read-side critical sections

From: Boqun Feng
Date: Sun Feb 28 2016 - 20:13:02 EST


On Fri, Feb 26, 2016 at 12:25:21PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 26, 2016 at 11:06:27AM +0800, Boqun Feng wrote:
> > On Thu, Feb 25, 2016 at 07:37:24AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 25, 2016 at 03:32:43PM +0100, Peter Zijlstra wrote:
> > > > On Tue, Feb 16, 2016 at 01:57:39PM +0800, Boqun Feng wrote:
> > > > > As a characteristic of RCU, read-side critical sections have a very
> > > > > loose connection with rcu_dereference()s, which is you can only be sure
> > > > > about an rcu_dereference() might be called in some read-side critical
> > > > > section, but if code gets complex, you may not be sure which read-side
> > > > > critical section exactly, this might be also an problem for some other
> > > > > locking mechanisms, that is the critical sections protecting data and
> > > > > the data accesses protected are not clearly correlated.
> > > > >
> > > > > In this series, we are introducing LOCKED_ACCESS framework and based on
> > > > > which, we implement the RCU_LOCKED_ACCESS functionality to give us a
> > > > > clear hint: which rcu_dereference() happens in which RCU read-side
> > > > > critical section.
> > > > >
> > > > > After this series applied, and if CONFIG_RCU_LOCKED_ACCESS=y, the proc
> > > > > file /proc/locked_access/rcu will show all relationships collected so
> > > > > far for rcu_read_lock() and their friends and rcu_dereference*().
> > > >
> > > > But why !? What does this bring us, why do I want to even look at these
> > > > patches?
> > >
> > > There were some complaints about the difficulty of figuring out what
> > > was being protected by a given rcu_read_lock() in cases where the
> > > corresponding rcu_dereference() is several function calls down, and
> > > especially in cases where the function calls are via pointers.
> > > These cases show up in a number of places, perhaps most prominently
> > > in networking.
> > > Boqun's patches therefore use lockdep to make an association between
> > > each rcu_dereference() and the rcu_read_lock() protecting it.
> > >
> >
> > Yes, this patch is aiming to provide some information about this,
> > actually the background of this patchset is a discussion between Ingo
> > and Paul last year:
> >
> > http://lkml.kernel.org/g/20150414102505.GA13015@xxxxxxxxx
> >
> > In that discussion, Ingo gave one example about how hard it is to figure
> > out what a RCU read-side critical section is protecting. Ingo's proposal
> > of solving this problem was to add an extra parameter for RCU related
> > primitives, however, Paul seemed to have concerns about that approach,
> > because there were many RCU users' code that would need to be modified
> > and there were corner cases where the one extra parameter was not enough
> > or not necessary.
> >
> > Therefore I tried to figure out a way for making the association of
> > rcu_dereference() and rcu_read_lock() automatically without the
> > modification of code of the RCU users. That's how this patchset
> > comes.
>
> So the problem I have with this is that your text didn't have a problem
> statement, and if there's no problem, what the heck is all this code
> for.
>

You are right, that's my bad. I should have done this better. Sorry
about that..

Please allow me to restate the problem and what this patchset can
help ;-)

The problem is:

Currently there is no way to know which rcu_dereference() and its
friends a rcu_read_lock() or one of its friends is protecting. And
the lack of such information is a pain for kernel developers and
reviewers to understand the purpose of a certain rcu_read_lock().

And this patchset is trying to provide the information to developers.

> Secondly, I'm not sure a random list of data is going to help anybody.
>

One possible usage might be:

If someone is going to refactor a piece of code which originally uses
RCU for synchronization, and now he wants to switch to other
synchronization mechanism for some data because of some reason(for
consistency or writer side latency guarantee, etc.). Before he makes the
change, he needs to know if this rcu_read_lock() also protects other
data, and this is not an easy job and I don't think we have anything
helpful for this. With RCU_LOCKED_ACCESS, he can build the kernel with
RCU_LOCKED_ACCESS enabled, run some workloads and get the information
from a /proc file. In this way, RCU_LOCKED_ACCESS is helpful to
developers who want to know the information of the associations between
rcu_read_lock() and rcu_dereference().

> The whole point of lockdep is that it tells you if you do broken. This
> doesn't, and cannot, because it lacks information to verify against.
>

Yes.. RCU_LOCKED_ACCESS is not a validator, so it cannot be as useful as
lockdep for finding bug.

> So I'm still not sure this is useful. Also, I would argue your code has
> problems if you cannot even find your rcu_read_lock().
>

I think what you mean here is, for example, the case where we use
preempt_disable() instead of rcu_read_lock_sched() to pair with
synchronize_sched(), right?

If so, sure, this series cannot do anything about this, however, this
series is to correlate rcu_read_lock() and its friends with
rcu_dereference() and its friends. So it may miss some RCU usages if
rcu_read_lock() and its friends are not used. But these missing usages
can be covered in a similar way as this series, I can either add it
in the next version of this series or provide it as a separate
follow-up patchset, of course, if we all agree the problem is worth
solving and this patchset is helpful.

Or I'm missing your point here?

> And yes, networking is a special snowflake, they very often push their
> rcu_read_lock(),local_bh_disable(),preempt_disable() whatnot, all the
> way out, because cycles.
>
> Which makes it entirely impossible to tell what all it guards. That's a
> very concious choice they make, but its a royal pain for a number of
> things. I'm not sure we can do anything about it.
>
> So no, I'm not convinced. Again, why is this useful?

Sorry again for not providing enough text for you to review this series,
hope now it's more clear.

Looking forwards to your thoughts, thank you ;-)

If you think this series deserves a reconsideration I will send out a
new version with more text about what's the problem and why this is
useful.

Regards,
Boqun

Attachment: signature.asc
Description: PGP signature