Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
From: Joel Fernandes
Date: Tue May 26 2020 - 20:46:03 EST
Hi Paul,
On Tue, May 26, 2020 at 6:29 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Tue, May 26, 2020 at 05:27:56PM -0400, Joel Fernandes wrote:
> > On Tue, May 26, 2020 at 02:09:47PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > > > BTW, I'm really itching to give it a try to make the scheduler more deadlock
> > > > > > resilient (that is, if the scheduler wake up path detects a deadlock, then it
> > > > > > defers the wake up using timers, or irq_work on its own instead of passing
> > > > > > the burden of doing so to the callers). Thoughts?
> > > > >
> > > > > I have used similar approaches within RCU, but on the other hand the
> > > > > scheduler often has tighter latency constraints than RCU does. So I
> > > > > think that is a better question for the scheduler maintainers than it
> > > > > is for me. ;-)
> > > >
> > > > Ok, it definitely keeps coming up in my radar first with the
> > > > rcu_read_unlock_special() stuff, and now the nocb ;-). Perhaps it could also
> > > > be good for a conference discussion!
> > >
> > > Again, please understand that RCU has way looser latency constraints
> > > than the scheduler does. Adding half a jiffy to wakeup latency might
> > > not go over well, especially in the real-time application area.
> >
> > Yeah, agreed that the "deadlock detection" code should be pretty light weight
> > if/when it is written.
>
> In addition, to even stand a chance, you would need to use hrtimers.
> The half-jiffy (at a minimum) delay from any other deferral mechanism
> that I know of would be the kiss of death, especially from the viewpoint
> of the real-time guys.
Just to make sure we are talking about the same kind of overhead - the
deferring is only needed if the rq lock is already held (detected by
trylocking). So there's no overhead in the common case other than the
trylock possibly being slightly more expensive than the regular
locking. Also, once the scheduler defers it, it uses the same kind of
mechanism that other deferral mechanisms use to overcome this deadlock
(timers, irq_work etc), so the overhead then would be no different
than what he have now - the RT users would already have the wake up
latency in current kernels without this idea implemented. Did I miss
something?
> > > But what did the scheduler maintainers say about this idea?
> >
> > Last I remember when it came up during the rcu_read_unlock_special() deadlock
> > discussions, there's no way to know for infra like RCU to know that it was
> > invoked from the scheduler.
> >
> > The idea I am bringing up now (about the scheduler itself detecting a
> > recursion) was never brought up (not yet) with the sched maintainers (at
> > least not by me).
>
> It might be good to bounce if off of them sooner rather than later.
Ok, I did that now over IRC. Thank you!
- Joel