Re: [PATCH tip/core/rcu 06/19] rcu: Add warning to detect half-interrupts

From: Joel Fernandes
Date: Fri Mar 15 2019 - 09:46:25 EST


On Fri, Mar 15, 2019 at 04:44:52PM +0900, Byungchul Park wrote:
> On 03/15/2019 04:31 PM, Byungchul Park wrote:
> > On Mon, Mar 11, 2019 at 09:39:39AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 29, 2018 at 03:20:34PM -0700, Paul E. McKenney wrote:
> > > > RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
> > > > either an interrupt that invokes rcu_irq_enter() but never invokes the
> > > > corresponding rcu_irq_exit() on the one hand, or an interrupt that never
> > > > invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
> > > > on the other. These things really did happen at one time, as evidenced
> > > > by this ca-2011 LKML post:
> > > >
> > > > http://lkml.kernel.org/r/20111014170019.GE2428@xxxxxxxxxxxxxxxxxx
> > > >
> > > > The reason why RCU tolerates half-interrupts is that usermode helpers
> > > > used exceptions to invoke a system call from within the kernel such that
> > > > the system call did a normal return (not a return from exception) to
> > > > the calling context. This caused rcu_irq_enter() to be invoked without
> > > > a matching rcu_irq_exit(). However, usermode helpers have since been
> > > > rewritten to make much more housebroken use of workqueues, kernel threads,
> > > > and do_execve(), and therefore should no longer produce half-interrupts.
> > > > No one knows of any other source of half-interrupts, but then again,
> > > > no one seems insane enough to go audit the entire kernel to verify that
> > > > half-interrupts really are a relic of the past.
> > > >
> > > > This commit therefore adds a pair of WARN_ON_ONCE() calls that will
> > > > trigger in the presence of half interrupts, which the code will continue
> > > > to handle correctly. If neither of these WARN_ON_ONCE() trigger by
> > > > mid-2021, then perhaps RCU can stop handling half-interrupts, which
> > > > would be a considerable simplification.
> > > Hi Paul and everyone,
> > > I was thinking some more about this patch and whether we can simplify this code
> > > much in 2021. Since 2021 is a bit far away, I thought working on it in again to
> > > keep it fresh in memory is a good idea ;-)
> > >
> > > To me it seems we cannot easily combine the counters (dynticks_nesting and
> > > dynticks_nmi_nesting) even if we confirmed that there is no possibility of a
> > > half-interrupt scenario (assuming simplication means counter combining like
> > > Byungchul tried to do in https://goo.gl/X1U77X). The reason is because these
> > > 2 counters need to be tracked separately as they are used differently in the
> > > following function:
> > Hi Joel and Paul,
> >
> > I always love the way to logically approach problems so I'm a fan of
> > all your works :) But I'm JUST curious about something here. Why can't
> > we combine them the way I tried even if we confirm no possibility of
> > half-interrupt? IMHO, the only thing we want to know through calling
> > rcu_is_cpu_rrupt_from_idle() is whether the interrupt comes from
> > RCU-idle or not - of course assuming the caller context always be an
> > well-defined interrupt context like e.g. the tick handler.
> >
> > So the function can return true if the caller is within a RCU-idle
> > region except a well-known single interrupt nested.
> >
> > Of course, now that we cannot confirm it yet, the crowbar is necessary.
> > But does it still have a problem even after confirming it? Why? What am
> > I missing? Could you explain why for me? :(
>
>
> Did you also want to consider the case the function is called from others
> than
> well-known interrupt contexts? If yes, then I agree with you, there doesn't
> seem to be the kind of code and it's not a good idea to let the function be
> called generally though.

We were discussing exactly this on the thread. I am going to be adding a
lockdep check to make sure the function isn't called from anywhere but an
interrupt context. Then once we can confirm that there are no more
half-interrupts in the future, we can apply your counter combining approach.

Based on the comments on the rrupt_from_idle() function, it wasn't clear to
me if the function was intended to be called from any context. That's why I
thought the counter combing approach would not work, but you are right - it
should work.

I will CC you on the lockdep check patch. Also hope you are subscribed to
the new shiny rcu@xxxxxxxxxxxxxxx list as well :-)

thanks,

- Joel