Re: [PATCH 0/3] arm64: perf: Make compat tracing better

From: Doug Anderson
Date: Mon Jun 07 2021 - 16:44:04 EST


Hi,

On Wed, Jun 2, 2021 at 10:56 AM Will Deacon <will@xxxxxxxxxx> wrote:
>
> Hi Doug,
>
> Thanks for posting this, and sorry for the delay in getting to it.
>
> On Fri, May 07, 2021 at 01:55:10PM -0700, Douglas Anderson wrote:
> > The goal for this series is to improve "perf" behavior when 32-bit
> > userspace code is involved. This turns out to be fairly important for
> > Chrome OS which still runs 32-bit userspace for the time being (long
> > story there).
>
> Watch out, your days are numbered! See [1].

Yeah, folks on the Chrome OS team are aware and we're trying our
darndest to move away. It's been an unfortunate set of circumstances
that has kept us on 32-bit this long. :( BTW: I like your suggestion
of "retirement" as a solution to dealing with this problem, but I'm
not quite ready to retire yet.


> > I won't repeat everything said in the individual patches since since
> > they are wordy enough as it is.
> >
> > Please enjoy and I hope this isn't too ugly/hacky for inclusion in
> > mainline.
> >
> > Thanks to Nick Desaulniers for his early review of these patches and
> > to Ricky for the super early prototype that some of this is based on.
>
> I can see that you've put a lot of effort into this, but I'm not thrilled
> with the prospect of maintaining these heuristics in the kernel. The
> callchain behaviour is directly visible to userspace, and all we'll be able
> to do is throw more heuristics at it if faced with any regression reports.
> Every assumption made about userspace behaviour results in diminishing
> returns where some set of programs no longer fall into the "supported"
> bucket and, on balance, I don't think the trade-off is worth it.
>
> If we were to do this in the kernel, then I'd like to see a spec for how
> frame-pointer based unwinding should work for Thumb and have it agreed
> upon and implemented by both GCC and LLVM. That way, we can implement
> the unwinder according to that spec and file bug reports against the
> compiler if it goes wrong.

Given how long this has been going on, I'd somewhat guess that getting
this implemented in GCC and LLVM is 1+ year out. Presumably Chrome OS
will be transitioned off 32-bit ARM by then.


> In lieu of that, I think we must defer to userspace to unwind using DWARF.
> Perf supports this via PERF_SAMPLE_STACK_USER and PERF_SAMPLE_REGS_USER,
> which allows libunwind to be used to create the callchain. You haven't
> mentioned that here, so I'd be interested to know why not.

Good point. So I guess I didn't mention it because:

a) I really know very little about perf. I got roped in this because I
understand stack unwinding, not because I know how to use perf well.
:-P So I personally have no idea how to set this up.

b) In the little bit of reading I did about this, people seemed to say
that using libunwind for perf sampling was just too slow / too much
overhead.


> Finally, you've probably noticed that our unwinding code for compat tasks
> is basically identical to the code in arch/arm/. If the functionality is
> going to be extended, it should be done there first and then we will follow
> to be compatible.

That's fair. I doubt that submitting patches to this area of code for
arm32 would be enjoyable, so I'll pass if it's all the same.

Given your feedback, I think it's fair to consider ${SUBJECT} patch
abandoned then. I'll see if people want to land it as a private patch
in the Chrome OS tree for the time being until we can more fully
abandon arm32 support or until the ARM teams working on gcc and clang
come up with a standard that we can support more properly.

In the meantime, if anyone cares to pick this patch up and move
forward, feel free to do so with my blessing.

-Doug