Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline
From: Paul E. McKenney
Date: Thu Jun 28 2018 - 01:11:38 EST
On Wed, Jun 27, 2018 at 07:51:34PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 27, 2018 at 08:57:21AM -0700, Paul E. McKenney wrote:
> > > Another variant, which simply skips the wakeup whever ran on an offline
> > > CPU, relying on the wakeup from rcutree_migrate_callbacks() right after
> > > the CPU really is dead.
> >
> > Cute! ;-)
> >
> > And a much smaller change.
> >
> > However, this means that if someone indirectly and erroneously causes
> > rcu_report_qs_rsp() to be invoked from an offline CPU, the result is an
> > intermittent and difficult-to-debug grace-period hang. A lockdep splat
> > whose stack trace directly implicates the culprit is much better.
>
> How so? We do an unconditional wakeup right after finding the offline
> cpu dead. There is only very limited code between offline being true and
> the CPU reporting in dead.
I am thinking more generally than this particular patch. People
sometimes invoke things from places they shouldn't, for example, the
situation leading to your patch that allows use of RCU much earlier in
the CPU-online process. It is nicer to get a splat in those situations
than a silent hang.
Thanx, Paul