Re: [PATCH v2 tip/core/rcu 04/39] srcu: Check for tardy grace-period activity in cleanup_srcu_struct()

From: Paul E. McKenney
Date: Tue Apr 18 2017 - 14:35:25 EST


On Mon, Apr 17, 2017 at 05:34:30PM -0700, Josh Triplett wrote:
> On Mon, Apr 17, 2017 at 05:33:32PM -0700, Josh Triplett wrote:
> > On Mon, Apr 17, 2017 at 04:44:51PM -0700, Paul E. McKenney wrote:
> > > Users of SRCU are obliged to complete all grace-period activity before
> > > invoking cleanup_srcu_struct(). This means that all calls to either
> > > synchronize_srcu() or synchronize_srcu_expedited() must have returned,
> > > and all calls to call_srcu() must have returned, and the last call to
> > > call_srcu() must have been followed by a call to srcu_barrier().
> > > Furthermore, the caller must have done something to prevent any
> > > further calls to synchronize_srcu(), synchronize_srcu_expedited(),
> > > and call_srcu().
> > >
> > > Therefore, if there has ever been an invocation of call_srcu() on
> > > the srcu_struct in question, the sequence of events must be as
> > > follows:
> > >
> > > 1. Prevent any further calls to call_srcu().
> > > 2. Wait for any pre-existing call_srcu() invocations to return.
> > > 3. Invoke srcu_barrier().
> > > 4. It is now safe to invoke cleanup_srcu_struct().
> > >
> > > On the other hand, if there has ever been a call to synchronize_srcu()
> > > or synchronize_srcu_expedited(), the sequence of events must be as
> > > follows:
> > >
> > > 1. Prevent any further calls to synchronize_srcu() or
> > > synchronize_srcu_expedited().
> > > 2. Wait for any pre-existing synchronize_srcu() or
> > > synchronize_srcu_expedited() invocations to return.
> > > 3. It is now safe to invoke cleanup_srcu_struct().
> > >
> > > If there have been calls to all both types of functions (call_srcu()
> > > and either of synchronize_srcu() and synchronize_srcu_expedited()), then
> > > the caller must do the first three steps of the call_srcu() procedure
> > > above and the first two steps of the synchronize_s*() procedure above,
> > > and only then invoke cleanup_srcu_struct().
> >
> > This commit message clearly explains the correct sequence for the
> > client, but not which aspects of this the change now enforces. Some of
> > the steps above remain the responsibility of the caller, while the
> > callee now checks more of them. Could you add something at the end
> > explaining the change and what it enforces?
>
> More importantly, perhaps this explanation could find its way into the
> documentation of cleanup_srcu_struct?

Like this?

/**
* cleanup_srcu_struct - deconstruct a sleep-RCU structure
* @sp: structure to clean up.
*
* Must invoke this only after you are finished using a given srcu_struct
* that was initialized via init_srcu_struct(). This code does some
* probabalistic checking, spotting late uses of srcu_read_lock(),
* synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu().
* If any such late uses are detected, the per-CPU memory associated with
* the srcu_struct is simply leaked and WARN_ON() is invoked. If the
* caller frees the srcu_struct itself, a use-after-free crash will likely
* ensue, but at least there will be a warning printed.
*/

I added the following paragraph to the commit log:

Note that cleanup_srcu_struct() does some probabilistic checks
for the caller failing to follow these procedures, in which
case cleanup_srcu_struct() does WARN_ON() and avoids freeing
the per-CPU structures associated with the specified srcu_struct
structure.

And added your Reviewed-by, but please let me if more is needed.

Thanx, Paul

> > > Reported-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> >
> > With the above change:
> > Reviewed-by: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> >
> > > kernel/rcu/srcu.c | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
> > > index ba41a5d04b49..6beeba7b0b67 100644
> > > --- a/kernel/rcu/srcu.c
> > > +++ b/kernel/rcu/srcu.c
> > > @@ -261,6 +261,11 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
> > > {
> > > if (WARN_ON(srcu_readers_active(sp)))
> > > return; /* Leakage unless caller handles error. */
> > > + if (WARN_ON(!rcu_all_batches_empty(sp)))
> > > + return; /* Leakage unless caller handles error. */
> > > + flush_delayed_work(&sp->work);
> > > + if (WARN_ON(sp->running))
> > > + return; /* Caller forgot to stop doing call_srcu()? */
> > > free_percpu(sp->per_cpu_ref);
> > > sp->per_cpu_ref = NULL;
> > > }
> > > --
> > > 2.5.2
> > >
>