Re: [PATCH v5 5/5] perf arm-spe: Don't wait for PERF_RECORD_EXIT event

From: Leo Yan
Date: Mon Jun 28 2021 - 08:12:34 EST


On Fri, Jun 25, 2021 at 02:25:15PM +0100, James Clark wrote:
>
>
> On 19/05/2021 08:19, Leo Yan wrote:
> > When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
> > perf event) for processing trace data, which is needless and even might
> > cause logic error, e.g. it might fail to correlate perf events with Arm
> > SPE events correctly.
> >
> > So this patch removes the condition checking for PERF_RECORD_EXIT event.
> >
> > Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
> > ---
> > tools/perf/util/arm-spe.c | 6 +-----
> > 1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 5c5b438584c4..58b7069c5a5f 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -717,11 +717,7 @@ static int arm_spe_process_event(struct perf_session *session,
> > sample->time);
> > }
> > } else if (timestamp) {
> > - if (event->header.type == PERF_RECORD_EXIT) {
> > - err = arm_spe_process_queues(spe, timestamp);
> > - if (err)
> > - return err;
> > - }
> > + err = arm_spe_process_queues(spe, timestamp);
> > }
> >
> > return err;
> >
>
> For the whole set:
> Reviewed-by: James Clark <james.clark@xxxxxxx>
> Tested-by: James Clark <james.clark@xxxxxxx>

> I see a big improvement in decoding involving multiple processes because the timestamps are now
> correlated with the comm and mmap events.
>
> For example perf-exec samples are visible right before the exec is done, and on an
> application that forks, samples are visible from all processes. For example:
>
> perf record -e arm_spe// -- bash -c "stress -c 1"
> perf script
>
> perf-exec 4502 [003] 259755.050409: 1 l1d-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050409: 1 tlb-access: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050409: 1 memory: ffff80001014b840 sched_clock+0x40 ([kernel.kallsyms])
> perf-exec 4502 [003] 259755.050411: 1 tlb-access: ffff800010120fb8 __rcu_read_lock+0x0 ([kernel.kallsyms])
> bash 4502 [003] 259755.050411: 1 branch-miss: ffff8000105b2a40 memcpy+0x80 ([kernel.kallsyms])
> bash 4502 [003] 259755.050411: 1 tlb-access: 0 [unknown] ([unknown])
> ...
> stress 4502 [003] 259755.051468: 1 l1d-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> stress 4502 [003] 259755.051468: 1 tlb-access: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
> stress 4502 [003] 259755.051468: 1 memory: ffff800010259a24 __vma_adjust+0x1f4 ([kernel.kallsyms])
>
> Previously samples were only attributed to 'stress', which was obviously wrong.

Thanks a lot for the review and testing, James!

Hi Arnaldo, I confirmed this patch set can be cleanly applied on
the latest acme/perf/core branch, so could you pick up this patch
set?

Thanks,
Leo