Re: [PATCH 2/3] arm64: perf: Improve compat perf_callchain_user() for clang leaf functions

From: Doug Anderson
Date: Mon Jun 07 2021 - 17:04:38 EST


Hi,

On Mon, Jun 7, 2021 at 2:14 AM James Clark <james.clark@xxxxxxx> wrote:
>
>
>
> On 07/05/2021 23:55, Douglas Anderson wrote:
> > It turns out that even when you compile code with clang with
> > "-fno-omit-frame-pointer" that it won't generate a frame pointer for
> > leaf functions (those that don't call any sub-functions). Presumably
> > clang does this to reduce the overhead of frame pointers. In a leaf
> > function you don't really need frame pointers since the Link Register
> > (LR) is guaranteed to always point to the caller>
> [...]
> >
> > arch/arm64/kernel/perf_callchain.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
> > index e5ce5f7965d1..b3cd9f371469 100644
> > --- a/arch/arm64/kernel/perf_callchain.c
> > +++ b/arch/arm64/kernel/perf_callchain.c
> > @@ -326,6 +326,20 @@ static void compat_perf_callchain_user(struct perf_callchain_entry_ctx *entry,
> > while ((entry->nr < entry->max_stack) && fp && !(fp & 0x3)) {
> > err = compat_perf_trace_1(&fp, &pc, leaf_lr);
> >
> > + /*
> > + * If this is the first trace and it didn't find the LR then
> > + * let's throw it in the trace first. This isn't perfect but
> > + * is the best we can do for handling clang leaf functions (or
> > + * the case where we're right at the start of the function
> > + * before the new frame has been pushed). In the worst case
> > + * this can cause us to throw an extra entry that will be some
> > + * location in the same function as the PC. That's not
> > + * amazing but shouldn't really hurt. It seems better than
> > + * throwing away the LR.
> > + */
>
> Hi Douglas,
>
> I think the behaviour with GCC is also similar. We were working on this change
> (https://lore.kernel.org/lkml/20210304163255.10363-4-alexandre.truong@xxxxxxx/)
> in userspace Perf which addresses the same issue.
>
> The basic concept of our version is to record only the link register
> (as in --user-regs=lr). Then use the existing dwarf based unwind
> to determine if the link register is valid for that frame, and then if
> it is and it doesn't already exist on the stack then insert it.
>
> You mention that your version isn't perfect, do you think that saving the
> LR and using something like libunwind in a post process could be better?

Using post processing atop a patch to always save the LR is definitely
the right solution IMO and (I think) you can fully overcome the "no
frame pointers in leaf functions" with the post processing.

-Doug