Re: [RFC] ftrace / perf 'recursion'

From: Peter Zijlstra
Date: Wed Aug 17 2016 - 10:57:18 EST


On Wed, Aug 17, 2016 at 10:25:59AM -0400, Steven Rostedt wrote:

> > > Also, it will prevent any tracing of NMIs that occur in there.
> >
> > It should not, see how I only mark the IRQ bit, not the NMI bit.
>
> Ah, I didn't look deep at what you set there. Maybe that would work.
> Still pretty hacky.

Sure :-)

> > > I would really like to keep this fix within perf if possible. If
> > > anything, the flag should just tell the perf function handler not to
> > > trace, this shouldn't stop all function handlers.
> >
> > Well, my thinking was that there's a reason most of irq_work is already
> > notrace. kernel/irq_work.c has CC_FLAGS_FTRACE removed. That seems to
> > suggest that tracing irq_work is a problem.
>
> Well, you were the one that added that ;-)

OK, I suppose I can do the same for perf only, which is basically the
first patch on this thread. And then remove the notrace muck for
irq_work.c.

> Are you calling a signal to userspace via the irq work? Maybe we should
> have a kernel thread that does that instead. That way, the irq works
> can be suspended until the kernel thread gets to run. Then even though
> the waking of the thread will cause more events, it will be spaced out
> enough not to cause an irq work storm.

Nah, that'd wreck the desired semantics. We could maybe use a task_work
for the signal cruft though, and only generate the signal on the return
to userspace. But I'm not sure that will cure the problem.

We'd still need the irq_work to wake tasks stuck in poll() and friends.
And once we're over the watermark, every new event will trigger that
wakeup, and the wakeup will generate a new event etc..