Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code
From: Paul E. McKenney
Date: Mon May 08 2017 - 18:36:14 EST
On Mon, May 08, 2017 at 05:16:09PM -0500, Josh Poimboeuf wrote:
> On Mon, May 08, 2017 at 02:07:54PM -0700, Paul E. McKenney wrote:
> > On Mon, May 08, 2017 at 03:43:33PM -0500, Josh Poimboeuf wrote:
> > > On Mon, May 08, 2017 at 01:15:58PM -0700, Paul E. McKenney wrote:
> > > > On Mon, May 08, 2017 at 02:47:29PM -0500, Josh Poimboeuf wrote:
> > > > > On Mon, May 08, 2017 at 03:13:22PM -0400, Steven Rostedt wrote:
> > > >
> > > > [ . . . ]
> > > >
> > > > > > If rcu is not watching, calling rcu_enter_irq() will have it watch
> > > > > > again. Even in NMI context I believe.
> > > > >
> > > > > What if you get an NMI while running in rcu_dynticks_eqs_enter() before
> > > > > it increments rdtp->dynticks? Will rcu_enter_irq() still work from the
> > > > rcu_irq_enter()
> > > > > NMI?
> > > >
> > > > The rcu_nmi_enter() function willl notice that RCU is not watching, and
> > > > will therefore atomically increment RCU's dynticks-idle counter, which
> > > > will be atomically incremented again upon return. Since the bottom bit
> > > > of this counter controls whether or not RCU is watching, RCU will be
> > > > watching during the NMI, will stop watching upon return from the NMI,
> > > > which restores state so as to allow rcu_irq_enter() to cause RCU to once
> > > > again watch. (NMI algorithm due to Andy Lutomirski.)
> > > >
> > > > > I'm just trying to understand what are the cases where rcu_enter_irq()
> > > > > *doesn't* work from an ftrace handler.
> > > >
> > > > It doesn't work from an NMI handler. Aside from possible architecture
> > > > specific special cases, it should work everywhere else.
> > >
> > > Ok, so just to clarify. Is there a bug in the ftrace stack tracer in
> > > the following situation?
> > >
> > > 1. RCU isn't watching
> > > 2. An NMI hits
> > > 3. ist_enter() calls into the ftrace stack tracer, before
> > > rcu_nmi_enter() is called, so RCU isn't watching yet
> > > 4. The ftrace stack tracer calls rcu_irq_enter(), which has no effect,
> > > so RCU still isn't watching
> > > 5. Hilarity ensues in the ftrace stack tracer
> >
> > This would be a problem if step 2's NMI hit rcu_irq_enter(),
> > rcu_irq_exit(), and friends in just the wrong place.
> >
> > I would suggest that ftrace() do something like this...
> >
> > if (in_nmi())
> > rcu_nmi_enter();
> > else
> > rcu_irq_enter();
> >
> > Except that, as Steven will quickly point out, this won't work at the
> > very edges of the NMI, when NMI_MASK won't be set in preempt_count().
> >
> > Other thoughts?
>
> Ok. So I think the livepatch ftrace handler would need the in_nmi()
> check, in case it's called early in the NMI.
>
> But on x86, rcu_nmi_enter() is also called in some non-NMI exception
> cases, from ist_enter(). So it appears that the in_nmi() check wouldn't
> be sufficient. We might instead need something like:
>
> if (in_nmi() || in_some_other_exception())
> rcu_nmi_enter();
> else
> rcu_irq_enter();
>
> But unfortunately the in_some_other_exception() function doesn't
> currently exist.
>
> So, one more question. Would it work if we just always called
> rcu_nmi_enter()?
I am a bit nervous about this. It would -at- -least- be necessary to have
interrupts disabled throughout the entire time from the rcu_nmi_enter()
through the matching rcu_nmi_exit(). And there might be other failure
modes that I don't immediately see.
But do we really need this, given the in_nmi() check that Steven
pointed out?
Thanx, Paul