Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()

From: Paul E. McKenney
Date: Tue Apr 14 2015 - 11:11:37 EST

Next message: Greg Kurz: "Re: [PATCH v4 7/8] vhost: feature to set the vring endianness"
Previous message: Josh Poimboeuf: "Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1"
In reply to: Ingo Molnar: "Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()"
Next in thread: Peter Zijlstra: "Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Apr 14, 2015 at 04:31:15PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > > I wish rcu_read_lock() had a data argument, for similar reasons -
> > > even if it just pointed to a pre-existing lock or an rcu head it
> > > never touches ;-)
> >
> > Heh! Jack Slingwine and I had that argument back in 1993. I
> > advocated placing the update-side lock into the rcu_read_lock()
> > equivalent, and he responded by showing me a use cases were (1)
> > there were no update-side locks and (2) there were many update-side
> > locks, and it was impossible to select just one on the read side.
> > ;-)
>
> So as a response I'd have attempted to hand-wave something about those
> scenarios being either not that common, or not that interesting?!! :-)

Unfortunately for me, Jack already had code actually implementing those
particular use cases. ;-)

> > However, DYNIX/ptx did not have anything like rcu_dereference() or
> > list_for_each_entry_rcu(), which perhaps can be used in your example
> > below. (Hey, that was 20 years ago, when 50MB was a lot of main
> > memory. So we relied on compilers being quite dumb.)
>
> I guess compilers are still dumb in many ways ;-)

But often dumb in much more complicated ways. ;-)

> > > know that cpusets are integrated with cgroups and I search
> > > kernel/cgroup.c for call_rcu(), do I find:
> > >
> > > call_rcu(&css->rcu_head, css_free_rcu_fn);
> > >
> > > aha!
> > >
> > > ... or if I drill down 3 levels into cpuset_for_each_child() ->
> > > css_for_each_child() -> css_next_child() do I see the RCU
> > > iteration.
> >
> > And I have felt that reviewing pain as well.
> >
> > But shouldn't these API members be tagged with "_rcu" to make that
> > more clear? Sort of like the difference between list_for_each_entry
> > and list_for_each_entry_rcu()?
>
> Yes, agreed absolutely!
>
> Having it as a syntactic element instead of a stylistic one forces
> such self-documentation though.
>
> At the cost of being an extra nuisance.

But this problem of important stuff happening multiple function calls
down the stack must be happening in contexts other than just RCU.
Perhaps we need something that can query the code? I could almost make
cbmc do this, but it is a bit aggressive about pointers to functions,
among other things. (The trick is to make the rcu_dereference() macros
be functions, and to have them all invoke the top-level function, and then
look at cbmc'c complaints about recursion, which would trace the path.)

> > > It would have been a lot clearer from the onset, if I had a hint
> > > syntactically:
> > >
> > > rcu_read_lock(&css->rcu_head);
> > > ...
> > > rcu_read_unlock(&css->rcu_head);
> >
> > I cannot resist asking what you put there if the update side uses
> > synchronize_rcu()... A NULL pointer? A pointer to
> > synchronize_rcu()? Something else? [...]
>
> So I'd either put a suitable zero-size struct ('struct
> rcu_head_sync'?) into the protected data structure: i.e. still
> annotate it in an active fashion, but don't change any code.
>
> This would be zero size in the non-debug case, but it would allow
> debugging data in the debug case.
>
> Another solution would be to not require such linking in all cases,
> only when there's a single update side lock.

Or we could have a global struct rcu_head_sync variable for each
such use, I guess.

> > [...] And what do you do in the not-uncommon case where multiple
> > RCU chains are being traversed in the same RCU read-side critical
> > section? One approach would be to use varargs, I suppose. Though
> > with a hash table, list, or tree, you could have a -lot- of
> > ->rcu_head structures to reference, and concurrent additions and
> > deletions mean that you wouldn't necessarily know which at
> > rcu_read_lock() time.
>
> I'd definitely keep it simple - i.e. no varargs.
>
> I'd just try to link with the data structure in general. Or I'd just
> forget about handling these cases altogether, at least initially -
> first see how it works out for the simpler cases.

Or have nested rcu_read_lock() calls, one for each protected data
structure. Sorry, couldn't resist. ;-)

> > > beyond the reviewer bonus I bet this would allow some extra debugging
> > > as well (only enabled in debug kernels):
> > >
> > > - for example to make sure we only access a field if _that field_ is
> > > RCU locked (reducing the chance of having the right locking for
> > > the wrong reason)
> >
> > One possibility would be to mark each traversal of an RCU-protected
> > pointer. Currently, if a multilinked structure is inserted in one
> > shot, only the initial pointer to that structure needs to have
> > rcu_dereference(). Otherwise, it is hard to tell exactly how far
> > the RCU protection is to extend. (Been having too much fun with
> > this sort of thing in the standards committees...)
> >
> > > - we could possibly also build lockdep dependencies out of such
> > > annotated RCU locking patterns.
> >
> > Tell me more?
>
> So to backpedal a bit, I think that in practice the use
> rcu_dereference*() gives us a lot of protection and documentation
> already - so I might be over-doing it.

One alternative would be a tool that could dig out the rcu_dereference()
calls that could be reached from a given region of code -- my cbmc
example above. Another would be to instrument the rcu_dereference() calls
somehow, though I am not yet exactly sure how -- perhaps automatically
making an association (at runtime) between a given rcu_dereference()
and all the rcu_read_lock() calls that protected.

> But we could essentially split up the current monolithic
> rcu_read_lock() class and turn it into classes mirroring the update
> side lock classes.
>
> This would at minimum give us access to CONFIG_LOCK_STAT statistics
> about the frequency of use of the various locks. Right now I can only
> see this:
>
> /proc/lockdep:ffffffff82d27887 OPS: 1412000 FD: 1 BD: 1 ......: rcu_read_lock
> /proc/lock_stat: rcu_read_lock-R: 0 0 0.00 0.00 0.00 0.00 0 1135745 0.00 969.66 1355676.33 1.19
> /proc/lock_stat: rcu_read_lock_sched-R: 0 0 0.00 0.00 0.00 0.00 0 16 0.25 20.71 30.40 1.90
> /proc/lock_stat: rcu_read_lock_bh-R: 0 0 0.00 0.00 0.00 0.00 0 5602 0.33 126.70 40478.88 7.23
>
> which merges them into essentially a single counter.
>
> If we had a more finegrained structure we could tell more about usage
> patterns? Not sure how valuable that is though.

If we could generate the fine-grained view automatically, it seems to
me that it could be worthwhile. I am a bit concerned about any approach
that requires accurate manual annotation of each of the 1900 instances
of rcu_read_lock() currently in the kernel.

> So for example on the SRCU side we could detect such deadlocks:
>
> mutex_lock(&mutex);
> synchronize_srcu(&srcu);
>
> vs.
>
> srcu_read_lock(&srcu);
> mutex_lock(&mutex);
>
> (I don't think we are detecting these right now.)

For one thing, we would have to reject this sort of false positive:

read_lock(&rw_lock);
...
srcu_read_lock(&srcu);
...
srcu_read_unlock(&srcu);
...
read_unlock(&rw_lock);

vs.

srcu_read_lock(&srcu);
...
read_lock(&rw_lock);
...
read_unlock(&rw_lock);
...
srcu_read_unlock(&srcu);

But yes, I suspect that we have been getting lucky because SRCU is not
yet used heavily.

> On the classical RCU side it's harder to construct deadlocks, because
> the read side is a non-sleeping and irq-safe primitive, while
> synchronize_rcu() is a sleeping method ;-) So most (all?) deadlock
> scenarios are avoided by just those properties.

I can't think of a deadload that would not be detected by those
properties.

> Another use would be to potentially split up the RCU read side into
> multiple grace period domains, flushed independently, a bit like how
> SRCU does it? That would be a non-debugging use for it.

Well, if we could do a lockdep-like association of the rcu_dereference()
calls with the rcu_read_lock() calls that protected them, would the
resulting clusters of association give us what we need? Couldn't
we then dump the lockdep sysfs files and see which rcu_dereference()
calls were protected by a given rcu_read_lock() call?

> > > - RCU aware list walking primitives could auto-check that this
> > > particular list is properly RCU locked.
> >
> > For example, that a lock in the proper update class was held during
> > the corresponding update?
>
> Yes, and also on the lookup side: that a 'lock' of the proper type is
> read-held during lookup. (with a few special annotations for special
> cases where as a side effect of other things we may have proper RCU
> read side protection.)

Hmmm... Suppose we also used lockdep to associate rcu_assign_pointer()
with the locks protecting it. We could easily find the set of
rcu_dereference() calls accessing that same RCU-protected pointer, and
then associate the update-side locks with the RCU read-side critical
sections.

Might be a very large pile of data in some cases, thoough! Which is
all the more reason for doing the associations automatically.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Greg Kurz: "Re: [PATCH v4 7/8] vhost: feature to set the vring endianness"
Previous message: Josh Poimboeuf: "Re: [PATCH 2/2] livepatch: Fix the bug if the function name is larger than KSYM_NAME_LEN-1"
In reply to: Ingo Molnar: "Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()"
Next in thread: Peter Zijlstra: "Re: [PATCH v5 05/10] seqlock: Better document raw_write_seqcount_latch()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]