Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints

From: Paul E. McKenney
Date: Tue May 26 2020 - 20:58:09 EST


On Tue, May 26, 2020 at 08:45:42PM -0400, Joel Fernandes wrote:
> Hi Paul,
>
> On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > > > > the burden of doing so to the callers). Thoughts?
> > > > > >
> > > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > > scheduler often has tighter latency constraints than RCU does. So I
> > > > > > think that is a better question for the scheduler maintainers than it
> > > > > > is for me. ;-)
> > > > >
> > > > > Ok, it definitely keeps coming up in my radar first with the
> > > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > > > > be good for a conference discussion!
> > > >
> > > > Again, please understand that RCU has way looser latency constraints
> > > > than the scheduler does. Adding half a jiffy to wakeup latency might
> > > > not go over well, especially in the real-time application area.
> > >
> > > Yeah, agreed that the "deadlock detection" code should be pretty light weight
> > > if/when it is written.
> >
> > In addition, to even stand a chance, you would need to use hrtimers.
> > The half-jiffy (at a minimum) delay from any other deferral mechanism
> > that I know of would be the kiss of death, especially from the viewpoint
> > of the real-time guys.
>
> Just to make sure we are talking about the same kind of overhead - the
> deferring is only needed if the rq lock is already held (detected by
> trylocking). So there's no overhead in the common case other than the
> trylock possibly being slightly more expensive than the regular
> locking. Also, once the scheduler defers it, it uses the same kind of
> mechanism that other deferral mechanisms use to overcome this deadlock
> (timers, irq_work etc), so the overhead then would be no different
> than what he have now - the RT users would already have the wake up
> latency in current kernels without this idea implemented. Did I miss
> something?

Aggressive real-time applications care deeply about the uncommon case.

Thanx, Paul

> > > > But what did the scheduler maintainers say about this idea?
> > >
> > > Last I remember when it came up during the rcu_read_unlock_special() deadlock
> > > discussions, there's no way to know for infra like RCU to know that it was
> > > invoked from the scheduler.
> > >
> > > The idea I am bringing up now (about the scheduler itself detecting a
> > > recursion) was never brought up (not yet) with the sched maintainers (at
> > > least not by me).
> >
> > It might be good to bounce if off of them sooner rather than later.
>
> Ok, I did that now over IRC. Thank you!
>
> - Joel