Re: [PATCH 08/10] rcu: Allow to deactivate nocb on a CPU

From: Paul E. McKenney
Date: Thu May 14 2020 - 11:47:11 EST


On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote:
> On Wed, May 13, 2020 at 11:38:31AM -0700, Paul E. McKenney wrote:
> > On Wed, May 13, 2020 at 06:47:12PM +0200, Frederic Weisbecker wrote:
> > > +static void __rcu_nocb_rdp_deoffload(struct rcu_data *rdp)
> > > +{
> > > + unsigned long flags;
> > > + struct rcu_node *rnp = rdp->mynode;
> > > +
> > > + printk("De-offloading %d\n", rdp->cpu);
> > > + kthread_park(rdp->nocb_cb_kthread);
> >
> > I am a bit concerned about this, because from this point until the
> > end of this function, no RCU callbacks can be invoked for this CPU.
> > This could be a problem if there is a callback flood in progress, and
> > such callback floods are in fact one reason that the sysadm might want
> > to be switching from offloaded to non-offloaded. Is it possible to
> > move this kthread_park() to the end of this function? Yes, this can
> > result in concurrent invocation of different callbacks for this CPU,
> > but because we have excluded rcu_barrier(), that should be OK.
> >
> > Or is there some failure mode that I am failing to see? (Wouldn't
> > be the first time...)
>
> Heh I actually worried about that. Ok the issue is that it leaves
> a window where nocb_cb and local caller of rcu_do_batch() can
> compete but the local caller doesn't lock the nocb_lock.

Indeed, my nightmare scenario involves some sort of preemption or
similar long delay at that point. Callbacks pile up, and then OOM!

> So there are two ways to solve that.:
>
> 1) - set cblist->offloaded = 0 locally
> - From the kthread while calling rcu_do_batch():
> check the value of cblist->offloaded everytime after
> we call rcu_nocb_lock() and stop messsing with the
> cblist and return when we see cblist->offloaded == 0
> - Allow to handle cblist locally without taking the nocb_lock
> - Park kthread
>
> But there I'm worried about races. Imagine we have:
>
>
> Kthread Local
> -------- -------
> rcu_do_batch() {
> rcu_nocb_lock()
> do stuff with cblist
> rcu_nocb_unlock()
> rcu_nocb_lock()
> set cblist->offloaded = 0
> rcu_nocb_unlock()
> ===> INT or preemption
> rcu_do_batch() {
> do stuff with cblist
>
> Are we guaranteed that the Local CPU will see the updates from
> the kthread while calling rcu_do_batch()? I would tend to say
> yes but I'm not entirely sure...
>
> Oh wait, that solution also implies that we can't re-enqueue
> extracted callbacks if we spent took much time in threaded
> rcu_do_batch(), as the cblist may have been offloaded while
> we executed the extracted callbacks.
>
> That's a lot of corner cases to handle, which is why I prefer
> the other solution:
>
> 2) enum cblist_offloaded {
> CBLIST_NOT_OFFLOADED,
> CBLIST_(DE)OFFLOADING,
> CBLIST_OFFLOADED
> }
>
> - Locally set cblist->offloaded = CBLIST_DEOFFLOADING
> - From the kthread while calling rcu_do_batch(), do as
> usual.
> - Local CPU can call rcu_do_batch() and if it sees CBLIST_DEOFFLOADING,
> rcu_nocb_lock() will take the lock.
> - Park kthread
> - Locally set cblist->offloaded = CBLIST_NOT_OFFLOADED
> - Local calls to rcu_do_batch() won't take the lock anymore.

This last seems best to me. The transition from CBLIST_NOT_OFFLOADED
to CBLIST_OFFLOADING of course needs to be on the CPU in question with
at least bh disabled. Probably best to be holding rcu_nocb_lock(),
but that might just be me being overly paranoid.

> > > +static long rcu_nocb_rdp_deoffload(void *arg)
> > > +{
> > > + struct rcu_data *rdp = arg;
> > > +
> > > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id());
> > > + __rcu_nocb_rdp_deoffload(rdp);
> > > +
> > > + return 0;
> > > +}
> >
> > For example, is the problem caused by invocations of this
> > rcu_nocb_rdp_deoffload() function?
>
> How so?

It looked to me like it wasn't excluding either rcu_barrier() or CPU
hotplug. It might also not have been pinning onto the CPU in question,
but that might just be me misremembering. Then again, I didn't see a
call to it, so maybe its callers set things up appropriately.

OK, I will bite... What is the purpose of rcu_nocb_rdp_deoffload()? ;-)

> > But if so, do we really need to acquire rcu_state.barrier_mutex?
>
> Indeed it was probably not needed if we parked the kthread before
> anything, as we would have kept the callbacks ordering.
>
> But now if we allow concurrent callbacks execution during the small
> window, we'll need it.

Agreed! And I do believe that concurrent callback execution will
prove better than a possibly indefinite gap in callback execution.

> > That aside, if it is possible to do the switch without interrupting
> > callback invocation? Or is your idea that because we are always executing
> > on the CPU being deoffloaded, that CPU has been prevented from posting
> > callbacks in any case?
>
> No in the tiny window between kthread_park() and the irqs being disabled,
> the workqueue can be preempted and thus call_rcu() can be called anytime.

Agreed! ;-)

> > If the latter, does that mean that it is not
> > possible to deoffload offlined CPUs? (Not sure whether this restriction
> > is a real problem, but figured that I should ask.)
>
> Ah in the case of offlined CPUs I simply call the function directly from the CPU
> that disables the nocb remotely. So we remotely park the kthread (that we
> always do anyway) and set offlined.

And the cpus_read_lock() in rcu_nocb_cpu_deoffload() prevents that CPU
from coming back online, so looks good!

Thanx, Paul