Re: [PATCH 2/3] perf callchain: Stop resolving callchains after invalid address

From: Ingo Molnar
Date: Fri Nov 27 2015 - 02:48:51 EST



* Namhyung Kim <namhyung@xxxxxxxxxx> wrote:

> Hi Ingo,
>
> On Thu, Nov 26, 2015 at 08:43:35AM +0100, Ingo Molnar wrote:
> >
> > * Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > > Unwinding optimized binaries using frame pointer gives garbage. Check
> > > callchain address and stop if it's under vm.mmap_min_addr sysctl value.
> > >
> > > Before:
> > > $ perf report --stdio --no-children -g callee
> > > ...
> > >
> > > 1.37% perf [kernel.vmlinux] [k] smp_call_function_single
> > > |
> > > ---smp_call_function_single
> > > _perf_event_enable
> > > perf_event_for_each_child
> > > perf_ioctl
> > > do_vfs_ioctl
> > > sys_ioctl
> > > entry_SYSCALL_64_fastpath
> > > __GI___ioctl
> > > 0
> > > 0
> > > 0x1c5aa70
> > > 0x1c5b910
> > > 0x1c5aa70
> > > 0x1c5b910
> > > 0x1c5aa70
> > > 0x1c5b910
> > > 0x1c5aa70
> > > 0x1c5b910
> > > 0x1c5aa70
> > > 0x1c5b910
> > > ...
> > >
> > > After:
> > > $ perf report --stdio --no-children -g callee
> > > ...
> > >
> > > 1.37% perf [kernel.vmlinux] [k] smp_call_function_single
> > > |
> > > ---smp_call_function_single
> > > _perf_event_enable
> > > perf_event_for_each_child
> > > perf_ioctl
> > > do_vfs_ioctl
> > > sys_ioctl
> > > entry_SYSCALL_64_fastpath
> > > __GI___ioctl
> >
> > In addition to that, would it make sense to terminate the callchain with an
> > indicator that we found something anomalous? Such an extra line:
> >
> > ...
> >
> > would not be intrusive, but would tell the informed reader that it's not a normal
> > ending of the call chain.
> >
> > This assumes that we can tell apart 'normal end of call chain' from 'seems to end
> > with garbage poiner' cases - can do we that?
>
> In case of fp unwind, I'm not sure we can determine whether it's
> normal end or not especially for optimized binaries. It seems kernel
> also can stop callchain anytime if it sees a broken frame.
>
> For dwarf unwind, I think it's also hard to tell since it can be
> stopped for various reasons like insufficient dump size or broken CFI,

But but. Doesn't your patch 'detect' an anomaly to begin with?

+ /*
+ * Callchain value under mmap_min_addr means it's broken
+ * or the end of callchain. Stop.
+ */
+ if (ip < mmap_min_addr) {
+ if (callchain_param.order == ORDER_CALLEE)
+ break;

all I'm asking for is to indicate it in some low-key visual fashion when we
encounter such a 'broken' call-chain.

I presume the 'old' way of ending the call-chain was that 'ip' was zero, right? We
should not print the indicator in that case.

Also, in the dwarf case I'd also see value in indicating if any of these events
occured:

> For dwarf unwind, I think it's also hard to tell since it can be stopped for
> various reasons like insufficient dump size or broken CFI,

even if we cannot catch all anomalies. Performance analysis must stand firm on a
hard rock of reliability and dependability, and we should always propagate
information about possible profiling data corruption/unreliability. That's why we
print the 'IO overload' messages during perf record for example.

Even if the problem is not caused by perf, but by external factors such as the
compiler/linker.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/