Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

From: Peter Zijlstra
Date: Mon Sep 25 2017 - 04:34:48 EST


On Fri, Sep 22, 2017 at 11:28:30AM -0700, Joel Fernandes wrote:
> >> + TP_printk("caller=%pF parent=%pF",
> >> + (void *)((unsigned long)(_stext) + __entry->caller_offs),
> >> + (void *)((unsigned long)(_stext) + __entry->parent_offs))
> >> +);
> >
> > So I don't get that function prototype. Why do we need two IPs?
>
> This follows the exact same "model" used by the preempt/irqsoff
> tracer. The rationale IIUC is a lot of times the actual preempt event
> is done by say a lock, but the actual caller of the lock function
> causing the preempt disable event is also of interest so this gives
> some more information of interest without needing the full stack, for
> example the spinlock acquiring in add_wait_queue disabled preemption
> here:
>
> grep-1041 [002] d..1 80.363455: preempt_disable:
> caller=_raw_spin_lock_irqsave+0x1d/0x40
> parent=add_wait_queue+0x15/0x50

I suppose that sort-of makes sense for the preempt-tracer, but its a
weird thing for a generic tracepoint.

> >> void start_critical_timings(void)
> >> {
> >> + if (this_cpu_read(tracing_preempt_cpu))
> >> + trace_preempt_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >> +
> >> + if (this_cpu_read(tracing_irq_cpu))
> >> + trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >> +
> >> start_critical_timings_tracer();
> >> }
> >> EXPORT_SYMBOL_GPL(start_critical_timings);
> >>
> >> void stop_critical_timings(void)
> >> {
> >> + if (this_cpu_read(tracing_preempt_cpu))
> >> + trace_preempt_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >> +
> >> + if (this_cpu_read(tracing_irq_cpu))
> >> + trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >> +
> >> stop_critical_timings_tracer();
> >> }
> >> EXPORT_SYMBOL_GPL(stop_critical_timings);
> >
> > And I feel these yield somewhat odd semantics, does that need explaining
> > somewhere?
>
> Maybe I can add a comment here, if you prefer that. When you meant
> semantics, do you mean 'critical' vs 'atomic' thing or do you mean the
> semantics/context of how this function is supposed to be used?

I would add the comment to the tracepoint definition.

On semantics, the whole stop/start excludes a fair bunch of walltime
from our measurement, I feel that needs to be called out and enumerated
(when time goes missing and why).

Given that the idle thread runs with preempt-off I understand its
purpose from the POV from the preempt-tracer, but its 'weird' behaviour
if you're looking at it from a pure tracepoint pov.

> >> void trace_preempt_off(unsigned long a0, unsigned long a1)
> >> {
> >> + if (this_cpu_read(tracing_preempt_cpu))
> >> + return;
> >> +
> >> + this_cpu_write(tracing_preempt_cpu, 1);
> >> +
> >> + trace_preempt_disable_rcuidle(a0, a1);
> >> tracer_preempt_off(a0, a1);
> >> }
> >> #endif
> >
> > And here you assume things like double on / double off don't happen,
> > which might well be so, but does seem somewhat fragile.
> >
>
> We are handling the cases where these functions might be called twice,
> but we are only interested in the first time they're called. I caught
> a dead lock happen when I didn't add such protection to
> trace_hardirqs_off so I added to these to the trace_hardirqs* and
> trace_preempt* ones as well to just to be extra safe and keep it
> consistent. Hope I understood your concern correctly, if not please
> let me know, thanks.

Under what conditions where they called twice? That seems like something
that should not happen to begin with. Esp the one I left quoted above,
disabling when its already disabled sounds like fail. So please provide
more details on the scenario you're working around.