Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

From: Travis Downs
Date: Wed Nov 14 2018 - 21:16:27 EST


On Sun, Nov 11, 2018 at 10:26 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
> On Sat, Nov 10, 2018 at 09:50:05PM -0500, Travis Downs wrote:

> LBR is not part of PEBS, but is collected separately in the PMI handler.

Thanks for clearing this up - so you can ignore any earlier
suggestions on my part of trying to use LBR to fix the unwinding
inconsistency.

> > In the case that PEBS events are used, the IP will differ essentially 100%
> > of the time, right? That is, there will always be *some* skid.
>
> I wouldn't say that. It depends on what the CPU is doing and the IPC
> of the code.

Well other than say a long-latency cache miss, it is my experience
that the skid is generally never zero. That is, the PEBS and ireg IP
will usually differ. This is mostly moot though: what is important is
how often the ireg skip results in a different call chain (i.e., a
return occurred between the PEBS point and the interrupt), as you have
pointed out.

>
> Also the backtrace inconsistency can only happen if the sample races with
> function return. If you don't then the backtrace will point
> to the correct function, even though the unwind IP is different.
>
> For example in the common case where you profile a long loop it
> is unlikely to happen.

Agreed.


> Could collect numbers how often it happens, but it would surprise
> me if anything complicated is worth it. I would just do the minimum fixes
> to address the unwinder errors, and perhaps add the "unwind ip differs"
> indication.

As above, I think the most important UX problem is not when the IP
differs, but when the top frame of the IP unwind is different than the
function in which the PEBS sample occurred. I think the case where the
skid ends up with both in the same function doesn't pose any
presentation difficulties [1]. When they are different though, it
seems tough to present a consistent picture.

[1] Strictly speaking, this the "IPs are in the same function" is not
sufficient. Imagine a scenario where you have T->B->A (T calls B calls
A) and the PEBS sample happens in A, and then A and B return, and now
C then A are called (T->C->A) and the PMI happens. Now the PEBS IP and
the ireg IP are in the same function, but the stacks are still
inconsistent. It is probably fine to paper this over and show the user
the T->C->A stack, as this stack is somehow accurate (it really
happened), but the user might be confused when he looks at the
annotation for A, and sees code being executed (having PEBS samples)
that he knows can never execute when C calls A (for example) since the
annotations are based on the hidden T->B->A execution...

Bleh.