Re: [RFC PATCH RT 4/4] rcutorture: Avoid problematic critical section nesting

From: Paul E. McKenney
Date: Thu Jun 27 2019 - 20:53:40 EST


On Thu, Jun 27, 2019 at 05:46:27PM -0500, Scott Wood wrote:
> On Thu, 2019-06-27 at 13:50 -0700, Paul E. McKenney wrote:
> > On Thu, Jun 27, 2019 at 03:16:09PM -0500, Scott Wood wrote:
> > > On Thu, 2019-06-27 at 11:00 -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 26, 2019 at 11:49:16AM -0500, Scott Wood wrote:
> > > > > >
> > > > > > On Fri, 21 Jun 2019 16:59:55 -0700
> > > > > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > I have no objection to the outlawing of a number of these
> > > > > > > sequences
> > > > > > > in
> > > > > > > mainline, but am rather pointing out that until they really are
> > > > > > > outlawed
> > > > > > > and eliminated, rcutorture must continue to test them in
> > > > > > > mainline.
> > > > > > > Of course, an rcutorture running in -rt should avoid testing
> > > > > > > things
> > > > > > > that
> > > > > > > break -rt, including these sequences.
> > > > > >
> > > > > > sequences in the code. And we also need to get Linus's approval of
> > > > > > this
> > > > > > as I believe he was against enforcing this in the past.
> > > > >
> > > > > Was the opposition to prohibiting some specific sequence? It's only
> > > > > certain
> > > > > misnesting scenarios that are problematic. The rcu_read_lock/
> > > > > local_irq_disable restriction can be dropped with the IPI-to-self
> > > > > added
> > > > > in
> > > > > Paul's tree. Are there any known instances of the other two
> > > > > (besides
> > > > > rcutorture)?
> >
> > If by IPI-to-self you mean the IRQ work trick, that isn't implemented
> > across all architectures yet, is it?
>
> Right... smp_send_reschedule() has wider coverage, but even then there's
> some hardware that just can't do it reasonably (e.g. pre-APIC x86).

Except that smp_send_reschedule() won't do anything unless the scheduler
things something needs to be done, as it its wake list is non-empty.
Which might explain why Peter Zijlstra didn't suggest it.

> So I
> guess the options are:
>
> 1. Accept that such hardware might experience delayed grace period
> completion in certain configurations,
> 2. Have such hardware check for need_resched in local_irq_enable() (not nice
> if sharing a kernel build with hardware that doesn't need it), or
> 3. Forbid the sequence (enforced by debug checks). Again, this would only
> prohibit rcu_read_lock()/local_irq_disable()/rcu_read_unlock()/
> local_irq_enable() *without* preempt disabling around the IRQ-disabled
> region.

4. If further testing continues to show it to be reliable, continue
using the scheme in -rcu.
5. Use a short-duration hrtimer to get a clean environment in short
order. Yes, the timer might fire while preemption and/or softirqs
are disabled, but then the code can rely on the following
preempt_enable(), local_bh_enable(), or whatever. This condition
should be sufficiently rare to avoid issues with hrtimer overhead.
6. Use smp_call_function_single() to IPI some other poor slob of a
CPU, which then does the same back. Non-waiting version in both
cases, of course.

Probably others as well.

> > Why not simply make rcutorture cyheck whether it is running in a
> > PREEMPT_RT_FULL environment and avoid the PREEMPT_RT_FULL-unfriendly
> > testing only in that case?
> >
> > And should we later get to a place where the PREEMPT_RT_FULL-unfriendly
> > scenarios are prohibited across all kernel configurations, then the module
> > parameter can be removed. Again, until we know (as opposed to suspect)
> > that these scenarios really don't happen, mainline rcutorture must
> > continue testing them.
>
> Yes, I already acknowledged that debug checks detecting the sequences should
> come before the test removal

OK, good to hear. As you may have noticed, I was getting the impression
that you might have changed your mind on this point. ;-)

> (including this patch as an RFC at this point
> was mainly meant as a demonstration of what's needed to get rcutorture to
> pass), but it'd be nice to have some idea of whether there would be
> opposition to the concept before coding up the checks. I'd rather not
> continue the state of "these sequences can blow up on RT and we don't know
> if they exist or not" any longer than necessary. Plus, only one of the
> sequences is exclusively an RT issue (though it's the one with the worst
> consequences).

Steve Rostedt's point about enlisting the aid of lockdep seems worth
looking into.

Thanx, Paul