Re: [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace

From: Paul E. McKenney
Date: Fri May 12 2017 - 16:32:07 EST


On Fri, May 12, 2017 at 04:05:32PM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 11:50:03 -0700
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, May 12, 2017 at 02:36:19PM -0400, Steven Rostedt wrote:
> > > On Fri, 12 May 2017 11:25:35 -0700
> > > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > On Fri, May 12, 2017 at 01:15:45PM -0400, Steven Rostedt wrote:
> > > > > From: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
> > > > >
> > > > > As stack tracing now requires "rcu watching", force RCU to be watching when
> > > > > recording a stack trace.
> > > > >
> > > > > Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
> > > >
> > > > Assuming that you never get to __trace_stack() if in an NMI handler,
> > > > this looks good to me!
> > > >
> > > > In contrast, if if __trace_stack() ever is called from an NMI handler,
> > > > invoking rcu_irq_enter() can be fatal.
> > >
> > > Then someone may die.
> > >
> > > OK, what's the case of running this in nmi? How does perf do it?
> >
> > I have no idea. If it cannot happen, then it cannot happen and all
> > is well, RCU is happy, and I am happy. ;-)
> >
> > > Do we just skip the check if it is in an nmi?
> > >
> > > if (!in_nmi()) {
> > > if (unlikely(rcu_irq_enter_disabled()))
> > > return;
> > > rcu_irq_enter();
> > > }
> > >
> > > __ftrace_trace_stack();
> > >
> > > if (!in_nmi())
> > > rcu_irq_exit();
> > >
> > > ?
> >
> > If it -can- happen, bail out of the function without doing the
>
> Why?
>
> > __ftrace_trace_stack()? Or does that just cause other problems further
> > down the road? Or BUG_ON(in_nmi())?
>
> Why?
>
> > But again if it cannot happen, no problem and no need for extra code.
>
> We can't call stack trace from nmi anymore? It calls rcu_read_lock()
> which is why we need to make sure rcu is watching, otherwise lockdep
> complains.

Ah, finally got it! If we are in_nmi(), you are relying on the
NMI handler's call to rcu_nmi_enter(), which works. The piece I was
forgetting was that you also recently said in an unrelated LKML thread
that all the functions called at the very beginings and ends of NMI
handlers (which can see !in_nmi()) are marked notrace, so that should
be covered as well.

So never mind! (And thank you for the explanation.)

Thanx, Paul