Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events
From: Leo Yan
Date: Mon Oct 04 2021 - 02:31:01 EST
Hi James,
On Thu, Sep 30, 2021 at 04:08:52PM +0100, James Clark wrote:
> On 23/09/2021 15:23, Leo Yan wrote:
> > On Thu, Sep 16, 2021 at 02:01:21PM -0700, Namhyung Kim wrote:
[...]
> >>> We also considered to use PERF_RECORD_SWITCH_CPU_WIDE event for setting
> >>> pid/tid, the Intel PT implementation uses two things to set sample's
> >>> pid/tid: one is PERF_RECORD_SWITCH_CPU_WIDE event and another is to detect
> >>> the branch instruction is the symbol "__switch_to". Since the trace
> >>> event PERF_RECORD_SWITCH_CPU_WIDE is coarse, so it only uses the new
> >>> pid/tid after the branch instruction for "__switch_to". Arm SPE is
> >>> 'statistical', thus it cannot promise the trace data must contain the
> >>> branch instruction for "__switch_to", please see details [2].
> >>
> >> I can see the need in the Intel PT as it needs to trace all (branch)
> >> instructions, but is it really needed for ARM SPE too?
> >> Maybe I am missing something, but it seems enough to have a
> >> coarse-grained context switch for sampling events..
> >
> > The issue is that the coarse-grained context switch if introduces any
> > inaccuracy in the reported result. If we can run some workloads and
> > prove the coarse-grained context switch doesn't cause significant bias,
> > it will be great and can give us the confidence for this approach.
>
> It sounds like it's worth testing. Do you think the inaccuracy would only
> apply to code in the kernel around the time of the switch? Or do you think
> it could affect userspace as well?
The inaccuracy should only apply to the kernel code. There would be
some samples will be wrongly accounted for the next task between the
function prepare_task_switch() and switch_to().
> It seems to me that the switch event
> would have a timestamp that would precede _all_ userspace code, but I'm not
> 100% sure on that.
Yes, the switch event is generated in the scheduler which precede
exiting to userspace:
__schedule()
`> context_switch()
`> prepare_task_switch()
`> perf_event_task_sched_out()
> I suppose it's easy to test.
I'd like to use the comparison method for the test:
We should enable PID tracing and capture in the perf.data, when decode
the trace data, we can based on context packet and based on the switch
events to generate out two results, so we can check how the difference
between these results.
> German Gomez actually starting looking into the switch events for SPE at the
> same time, so I've CCd him and maybe he can do some testing of the patch.
Cool! German is welcome to continue the related work; since I am in
holiday this week, I will try this as well, if I have any conclusion
will get back to you guys.
If the test result shows good enough, I personally think we need finish
below items:
- Enable PID tracing and decode with context packets;
- Provide interface to user space so perf tool knows if should use
hardware PID or rollback to context switch events;
- Merge Namhyung's patch for using switch events for samples.
Thanks,
Leo