Re: [PATCH][V3] arm64: perf: Get the wrong PC value in REGS_ABI_32 mode

From: Will Deacon
Date: Thu May 28 2020 - 03:54:26 EST


On Thu, May 28, 2020 at 09:06:07AM +0800, Jiping Ma wrote:
> On 05/27/2020 11:19 PM, Mark Rutland wrote:
> > On Wed, May 27, 2020 at 09:33:00AM +0800, Jiping Ma wrote:
> > > On 05/26/2020 06:26 PM, Mark Rutland wrote:
> > > > On Mon, May 11, 2020 at 10:52:07AM +0800, Jiping Ma wrote:
> > > This modification can not fix our issue,  we need
> > > perf_reg_abi(current) == PERF_SAMPLE_REGS_ABI_32 to judge if it is 32-bit
> > > task or not,
> > > then return the correct PC value.
> > I must be missing something here.
> >
> > The core code perf_reg_abi(task) is called with the task being sampled,
> > and the regs are from the task being sampled. For a userspace sample for
> > a compat task, compat_user_mode(regs) should be equivalent to the
> > is_compat_thread(task_thread_info(task)) check.
> >
> > What am I missing?
> This issue caused by PC value is not correct. regs are sampled in function
> perf_output_sample_regs, that call perf_reg_value(regs, bit) to get PC
> value.
> PC value is regs[15] in perf_reg_value() function. it should be regs[32].
>
> perf_output_sample_regs(struct perf_output_handle *handle,
>                         struct pt_regs *regs, u64 mask)
> {
>         int bit;
>         DECLARE_BITMAP(_mask, 64);
>
>         bitmap_from_u64(_mask, mask);
>         for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) {
>                 u64 val;
>
>                 val = perf_reg_value(regs, bit);
>                 perf_output_put(handle, val);
>         }
> }

Yes, but Mark's point is that checking 'compat_user_mode(regs)' should be
exactly the same as checking 'perf_reg_abi(current) == PERF_SAMPLE_REGS_ABI_32'.
Are you saying that's not the case? If so, please can you provide an example
of when they are different?

Leaving that aside for a second, I also think it's reasonable to question
whether this whole interface is busted or not. I looked at it last night but
struggled to work out what it's supposed to do. Consider these three
scenarios, all under an arm64 kernel:

1. 64-bit perf + 64-bit application being profiled
2. 64-bit perf + 32-bit application being profiled
3. 32-bit perf + 32-bit application being profiled

It looks like the current code is a bodge to try to handle both (2) and
(3) at the same time:

- In case (3), userspace only asks about registers 0-15
- In case (2), we fudge the higher registers so that 64-bit SP and LR
hold the 32-bit values as a bodge to allow a 64-bit dwarf unwinder
to unwind the stack

So the idea behind the patch looks fine because case (3) is expecting the PC
in register 15 and instead gets 0, but the temptation is to clean this up so
that cases (2) and (3) report the same data to userspace (along the lines of
Mark's patch), namely only the first 16 registers with the PC moved down. We
can only do that if the unwinder is happy, which it might be if it only ever
looks up dwarf register numbers based on the unwind tables in the binary.
Somebody would need to dig into that. Otherwise, if it generates unconditional
references to things like register 30 to grab the link register, then we're
stuck with the bodge and need to special-case the PC.

Thoughts?

Will