Re: [patch 4/8] x86/entry: Move irq tracing on syscall entry to C-code

From: Paul E. McKenney
Date: Sun Mar 01 2020 - 13:26:09 EST


On Sun, Mar 01, 2020 at 07:12:25PM +0100, Thomas Gleixner wrote:
> Andy Lutomirski <luto@xxxxxxxxxx> writes:
> > On Sun, Mar 1, 2020 at 7:21 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
> >> >> On Mar 1, 2020, at 2:16 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> >> Ok, but for the time being anything before/after CONTEXT_KERNEL is unsafe
> >> >> except trace_hardirq_off/on() as those trace functions do not allow to
> >> >> attach anything AFAICT.
> >> >
> >> > Can you point to whatever makes those particular functions special? I
> >> > failed to follow the macro maze.
> >>
> >> Those are not tracepoints and not going through the macro maze. See
> >> kernel/trace/trace_preemptirq.c
> >
> > That has:
> >
> > void trace_hardirqs_on(void)
> > {
> > if (this_cpu_read(tracing_irq_cpu)) {
> > if (!in_nmi())
> > trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
> > this_cpu_write(tracing_irq_cpu, 0);
> > }
> >
> > lockdep_hardirqs_on(CALLER_ADDR0);
> > }
> > EXPORT_SYMBOL(trace_hardirqs_on);
> > NOKPROBE_SYMBOL(trace_hardirqs_on);
> >
> > But this calls trace_irq_enable_rcuidle(), and that's the part of the
> > macro maze I got lost in. I found:
> >
> > #ifdef CONFIG_TRACE_IRQFLAGS
> > DEFINE_EVENT(preemptirq_template, irq_disable,
> > TP_PROTO(unsigned long ip, unsigned long parent_ip),
> > TP_ARGS(ip, parent_ip));
> >
> > DEFINE_EVENT(preemptirq_template, irq_enable,
> > TP_PROTO(unsigned long ip, unsigned long parent_ip),
> > TP_ARGS(ip, parent_ip));
> > #else
> > #define trace_irq_enable(...)
> > #define trace_irq_disable(...)
> > #define trace_irq_enable_rcuidle(...)
> > #define trace_irq_disable_rcuidle(...)
> > #endif
> >
> > But the DEFINE_EVENT doesn't have the "_rcuidle" part. And that's
> > where I got lost in the macro maze. I looked at the gcc asm output,
> > and there is, indeed:
>
> DEFINE_EVENT
> DECLARE_TRACE
> __DECLARE_TRACE
> __DECLARE_TRACE_RCU
> static inline void trace_##name##_rcuidle(proto)
> __DO_TRACE
> if (rcuidle)
> ....
>
> > But I also don't see why this is any different from any other tracepoint.
>
> Indeed. I took a wrong turn at some point in the macro jungle :)
>
> So tracing itself is fine, but then if you have probes or bpf programs
> attached to a tracepoint these use rcu_read_lock()/unlock() which is
> obviosly wrong in rcuidle context.

Definitely, any such code needs to use tricks similar to that of the
tracing code. Or instead use something like SRCU, which is OK with
readers from idle. Or use something like Steve Rostedt's workqueue-based
approach, though please be very careful with this latter, lest the
battery-powered embedded guys come after you for waking up idle CPUs
too often. ;-)

Thanx, Paul