Re: [PATCH 3/3] srcu: Fix broken node geometry after early ssp init

From: Paul E. McKenney
Date: Fri Apr 02 2021 - 18:12:36 EST


On Fri, Apr 02, 2021 at 10:50:38PM +0200, Frederic Weisbecker wrote:
> On Fri, Apr 02, 2021 at 08:03:57AM -0700, Paul E. McKenney wrote:
> > On Fri, Apr 02, 2021 at 12:02:21PM +0200, Frederic Weisbecker wrote:
> > > Arguably that's a quite a corner case and I don't expect anyone to call
> > > start_poll_synchronize_srcu() so early but who knows. An alternative is to
> > > forbid it and warn if used before srcu is initialized.
> >
> > Another approach would be to have start_poll_synchronize_rcu() check to
> > see if initialization has happened, and if not, simply queue a callback.
> >
> > Any other ways to make this work?
>
> Ok I think that should work. We want to make sure that the cookies returned
> by each call to start_poll_synchronize_rcu() before rcu_init_geometry() will
> match the gpnums targeted by the corresponding callbacks we requeue.
>
> Since we are very early and the workqueues can't schedule, the grace periods
> shouldn't be able to complete. Assuming ssp->srcu_gp_seq is initialized as N.
> The first call to call_srcu/start_poll_synchronize_rcu should target gpnum N +
> 1. Then all those that follow should target gpnum N + 2 and not further.
>
> While we call srcu_init() and requeue the callbacks in order after resetting
> gpnum to N, this should just behave the same and attribute the right gpnum
> to each callbacks.
>
> It would have been a problem if the workqueues could schedule and complete
> grace periods concurrently because we might target gpnum N + 3, N + 4, ...
> as we requeue the callbacks. But it's not the case so we should be fine as
> long as callbacks are requeued in order.
>
> Does that sound right to you as well? If so I can try it.

Makes sense to me!

There also needs to be an additional start_poll_synchronize_rcu() check
to avoid double call_rcu() of a single rcu_head structure. But everything
is single-threaded at that point, and this check is after the check
for already being initialized, so this should be no problem.

And yes, srcu_init() happens well before context switch is possible,
let alone workqueues scheduling. Famous last words...

Thanx, Paul