Re: tty^Wrcu/perf lockdep trace.

From: Paul E. McKenney
Date: Mon Oct 07 2013 - 08:59:43 EST


On Mon, Oct 07, 2013 at 01:24:21PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 04, 2013 at 05:23:48PM -0700, Paul E. McKenney wrote:
> > The underlying problem is that perf is invoking call_rcu() with the
> > scheduler locks held, but in NOCB mode, call_rcu() will with high
> > probability invoke the scheduler -- which just might want to use its
> > locks. The reason that call_rcu() needs to invoke the scheduler is
> > to wake up the corresponding rcuo callback-offload kthread, which
> > does the job of starting up a grace period and invoking the callbacks
> > afterwards.
> >
> > One solution (championed on a related problem by Lai Jiangshan) is to
>
> That's rcu_read_unlock_special(), right?

Yep, different location, same general approach.

> > simply defer the wakeup to some point where scheduler locks are no longer
> > held. Since we don't want to unnecessarily incur the cost of such
> > deferral, the task before us is threefold:
> >
> > 1. Determine when it is likely that a relevant scheduler lock is held.
> >
> > 2. Defer the wakeup in such cases.
> >
> > 3. Ensure that all deferred wakeups eventually happen, preferably
> > sooner rather than later.
> >
> > We use irqs_disabled_flags() as a proxy for relevant scheduler locks
> > being held. This works because the relevant locks are always acquired
> > with interrupts disabled. We may defer more often than needed, but that
> > is at least safe.
>
> Fair enough; do you feel the need for something more specific?

Maybe...

One advantage of the current approach is that it deals with any possible
deadlock. If we were to focus on specific locks, my fear is that we
would be repeatedly chasing this sort of issue as new locks were added.

> > The wakeup deferral is tracked via a new field in the per-CPU and
> > per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.
> >
> > This flag is checked by the RCU core processing. The __rcu_pending()
> > function now checks this flag, which causes rcu_check_callbacks()
> > to initiate RCU core processing at each scheduling-clock interrupt
> > where this flag is set. Of course this is not sufficient because
> > scheduling-clock interrupts are often turned off (the things we used to
> > be able to count on!). So the flags are also checked on entry to any
> > state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
> > state and NO_HZ_FULL user-mode-execution state.
>
> So RCU doesn't current differentiate between EQS for nr_running==1 and
> nr_running==0?

Between usermode nr_running==1 and idle, you mean?

It has separate APIs for these two cases, but they both call the same
functions. So yes, I can put a check in one place and catch both of
these two cases.

> > This approach should allow call_rcu() to be invoked regardless of what
> > locks you might be holding, the key word being "should".
>
> Agreed. Except it looks like you've inverted the deferred wakeup
> condition :-)

Good point -- will send out updated patch.

/me wonders what else he messed up...

Thanx, Paul

> > @@ -2314,6 +2323,22 @@ static int rcu_nocb_kthread(void *arg)
> > return 0;
> > }
> >
> > +/* Is a deferred wakeup of rcu_nocb_kthread() required? */
> > +static bool rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp)
> > +{
> > + return ACCESS_ONCE(rdp->nocb_defer_wakeup);
> > +}
> > +
> > +/* Do a deferred wakeup of rcu_nocb_kthread(). */
> > +static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
> > +{
> > + if (rcu_nocb_need_deferred_wakeup(rdp))
>
> !rcu_nocb_need_deferred_wakeup() ?
>
> > + return;
> > + ACCESS_ONCE(rdp->nocb_defer_wakeup) = false;
> > + wake_up(&rdp->nocb_wq);
> > + trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, TPS("DeferredWakeEmpty"));
> > +}
> > +
> > /* Initialize per-rcu_data variables for no-CBs CPUs. */
> > static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> > {
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/