Re: [PATCH] Add support for disabling Intel PT trace in ftrace
From: Thomas Gleixner
Date: Wed Nov 23 2016 - 05:21:34 EST
On Tue, 22 Nov 2016, Steven Rostedt wrote:
> On Fri, 18 Nov 2016 08:55:24 -0800
> Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>
> > From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> >
> > ftrace has powerfull trigger functions. Intel PT on modern Intel CPUs
> > can trace execution flow.
> >
> > For debugging I found it useful to disable the PT trace from ftrace triggers,
> > for example when specific kernel functions are hit, which indicate
> > a problem. Then we can see the exact execution trace up to this point.
> >
> > This patch adds a "ptoff" ftrace trigger/function that disables the trace
> > on the current function. The PT trace still has to be set up with perf
> >
> > % perf record -e intel_pt// -a ... &
> > % cd /sys/kernel/debug/tracing
> > % echo do_page_fault:ptoff > set_ftrace_filter
> > ...
> > % cd -
> > % kill %1
> > % perf script --itrace=i0ns
> >
> > I only implemented local disabling. Enabling would be much more complicated
> > and require a black list of functions to avoid recursion. Global
> > disabling with IPIs would be possible, but also risk some deadlock
> > scenarios. Local disabling is very easy and can be done without
> > accessing any special state, so there are no such problems. It is
> > usually good enough for debugging purposes. The trace can be always
> > reenabled from perf.
> >
> > This patch adds "ptoff" both as ftrace trigger and ftrace functions.
> > This makes it work from "set_ftrace_filter" and through the trigger
> > field of trace points.
This changelog reads like a fairy tale and certainly needs some care.
> > The PT driver exports a pt_disable() function for this that can be also
> > used for manual instrumentation.
> > diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> > index 185c39fea2a0..5dc8ec658678 100644
> > --- a/Documentation/trace/ftrace.txt
> > +++ b/Documentation/trace/ftrace.txt
> > @@ -2549,6 +2549,11 @@ The following commands are supported:
> > command, it only prints out the contents of the ring buffer for the
> > CPU that executed the function that triggered the dump.
> >
> > +- ptoff
> > + When the function is hit disable Intel PT trace. The Intel PT
> > + trace has to be set up earlier with perf record -a -e intel_pt// ...
> > + This disables the trace on the current CPU only.
When the function is hit, Intel PT trace is disabled on the current
CPU; PT trace on other CPUs is unaffected.
The Intel PT trace has to be set up earlier with:
perf record -e intel_pt// ...
> > +/*
> > + * Disable the PT trace for debugging purposes.
> > + */
> > +void pt_disable(void)
> > +{
> > + u64 val;
> > +
> > + if (!boot_cpu_has(X86_FEATURE_INTEL_PT))
> > + return;
> > +
> > + rdmsrl_safe(MSR_IA32_RTIT_CTL, &val);
Why is this using rdmsrl_safe()? If that's required then the return value
should be checked and acted upon.
> > + val &= ~RTIT_CTL_TRACEEN;
> > + wrmsrl_safe(MSR_IA32_RTIT_CTL, val);
How does this interact with perf, the PT nmi handler, context switch etc.?
I can't see anything, but the implications of this need to be documented.
> > +}
> > +EXPORT_SYMBOL(pt_disable);
This export is not required for the trace trigger. This wants to be a
seperate patch, EXPORT_SYMBOl_GPL and an in tree user as any other export..
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 4741ecdb9817..a408d288298b 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -1387,4 +1387,6 @@ int perf_event_exit_cpu(unsigned int cpu);
> > #define perf_event_exit_cpu NULL
> > #endif
> >
> > +void pt_disable(void);
This is the wrong header file to declare the function. Aside of that
pt_disable() is a too generic name. Something like x86_intel_pt_disable()
makes it clear what this is about.
Thanks,
tglx