Re: [RFC PATCH RT 4/4] rcutorture: Avoid problematic critical section nesting

From: Paul E. McKenney
Date: Thu Jun 27 2019 - 16:51:01 EST


On Thu, Jun 27, 2019 at 03:16:09PM -0500, Scott Wood wrote:
> On Thu, 2019-06-27 at 11:00 -0700, Paul E. McKenney wrote:
> > On Wed, Jun 26, 2019 at 11:49:16AM -0500, Scott Wood wrote:
> > > On Wed, 2019-06-26 at 11:08 -0400, Steven Rostedt wrote:
> > > > On Fri, 21 Jun 2019 16:59:55 -0700
> > > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxx> wrote:
> > > >
> > > > > I have no objection to the outlawing of a number of these sequences
> > > > > in
> > > > > mainline, but am rather pointing out that until they really are
> > > > > outlawed
> > > > > and eliminated, rcutorture must continue to test them in mainline.
> > > > > Of course, an rcutorture running in -rt should avoid testing things
> > > > > that
> > > > > break -rt, including these sequences.
> > > >
> > > > We should update lockdep to complain about these sequences. That would
> > > > "outlaw" them in mainline. That is, after we clean up all the current
> > > > sequences in the code. And we also need to get Linus's approval of
> > > > this
> > > > as I believe he was against enforcing this in the past.
> > >
> > > Was the opposition to prohibiting some specific sequence? It's only
> > > certain
> > > misnesting scenarios that are problematic. The rcu_read_lock/
> > > local_irq_disable restriction can be dropped with the IPI-to-self added
> > > in
> > > Paul's tree. Are there any known instances of the other two (besides
> > > rcutorture)?

If by IPI-to-self you mean the IRQ work trick, that isn't implemented
across all architectures yet, is it?

> > Given the failure scenario Sebastian Siewior reported today, there
> > apparently are some, at least when running threaded interrupt handlers.
>
> That's the rcu misnesting, which it looks like we can allow with the IPI-to-
> self; I was asking about the other two. I suppose if we really need to, we
> could work around preempt_disable()/local_irq_disable()/preempt_enable()/
> local_irq_enable() by having preempt_enable() do an IPI-to-self if
> need_resched is set and IRQs are disabled. The RT local_bh_disable()
> atomic/non-atomic misnesting would be more difficult, but I don't think
> impossible. I've got lazy migrate disable working (initially as an attempt
> to deal with misnesting but it turned out to give a huge speedup as well;
> will send as soon as I take care of a loose end in the deadline scheduler);
> it's possible that something similar could be done with the softirq lock
> (and given that I saw a slowdown when that lock was introduced, it may also
> be worth doing just for performance).
>
> BTW, it's not clear to me whether the failure Sebastian saw was due to the
> bare irq disabled version, which was what I was talking about prohibiting
> (he didn't show the context that was interrupted). The version where
> preempt is disabled (with or without irqs being disabled inside the preempt
> disabled region) definitely happens and is what I was trying to address with
> patch 3/4.

I don't claim to yet fully understand what Sebastian was seeing, though
I am obviously hoping that my local experiments showing it to be fixed
in current -rcu hold true.

Why not simply make rcutorture check whether it is running in a
PREEMPT_RT_FULL environment and avoid the PREEMPT_RT_FULL-unfriendly
testing only in that case? This could be done compatibly with mainline by
adding another rcutorture module parameter that suppressed the problematic
testing, disabled by default. Such a patch could be accepted into
mainline, and then -rt could have a very small patch that changed the
default to enabled for CONFIG_PREEMPT_RT_FULL=y kernels.

And should we later get to a place where the PREEMPT_RT_FULL-unfriendly
scenarios are prohibited across all kernel configurations, then the module
parameter can be removed. Again, until we know (as opposed to suspect)
that these scenarios really don't happen, mainline rcutorture must
continue testing them.

Thanx, Paul