Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events

From: German Gomez
Date: Tue Oct 19 2021 - 08:21:24 EST


Hi Leo,

Many thanks for you comments as always and sorry for the rushed patch.

On 18/10/2021 14:23, Leo Yan wrote:
> Hi German,
>
> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
>> Hi,
>>
>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
>> included for tracing forks. The patch would sit on top of Namhyung's.
> Yeah, it's good to add PERF_RECORD_SWITCH.
>
>> On 12/10/2021 12:07, German Gomez wrote:
>>> Hi, Leo and Namhyung,
>>>
>>> I want to make sure I'm on the same page as you regarding this topic.
>>>
>>> [...]
>>>
>>> If we are not considering patching the driver at this stage, so we allow
>>> hardware tracing on non-root namespaces. I think we could proceed like
>>> this:
>>>
>>> � - For userspace, always use context-switch events as they are
>>> ��� accurate and consistent with namespaces.
> I don't think you can always use context-switch events for userspace
> samples. The underlying mechanism is when there have context-switch
> event or context packet is coming, it will invoke the function
> machine__set_current_tid() to set current pid/tid; afterwards, we
> can retrieve the current pid/tid with the function
> arm_spe_set_pid_tid_cpu().
>
> The question is that if we want to use the tid/pid info at the same
> time for both context-switch events and context packets, then it's
> hard to maintain. E.g. we need to create multiple thread context, one
> is used to track pid info coming from context-switch events and
> another context is to track pid info from context packet.

My thinking was to use only one of the methods for the entire run, but
the code below is not expressive enough I'm afraid and I agree it could
become hard to maintain. I need to polish it up.

>
> To simplify the code, I still think we give context packet priority and
> use it if it's avalible. And we rollback to use context-switch events
> for pid/tid when context packet is not avaliable.

OK if it simplifies things. I think context-pkt availability can be
determined before any events are processed by looking at the top record
in the auxtrace_heap, o any of the auxtrace_queues.

>>> � - For kernel tracing, if context packets are enabled, use them, but
>>> ��� warn the user that the PIDs correspond to the root namespace.
>>> � - Otherwise, use context-switch events and warn the user of the time
>>> ��� inaccuracies.
>>>
>>> Later, if the driver is patched to disable context packets outside the
>>> root namespace, kernel tracing could fall back to using context-switch
>>> events and warn the user with a single message about the time
>>> inaccuracies.
>>>
>>> If we are aligned, we could collect your feedback and share an updated
>>> patch that considers the warnings.
>>>
>>> Many thanks
>>> Best regards
>> ---
>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>> �1 file changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index 708323d7c93c..6a2f7a484a80 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -71,6 +71,17 @@ struct arm_spe {
>> ���� u64��� ��� ��� ��� kernel_start;
>> �
>> ���� unsigned long��� ��� ��� num_events;
>> +
>> +��� /*
>> +��� �* Used for PID tracing.
>> +��� �*/
>> +��� u8��� ��� ��� ��� exclude_kernel;
>> +
>> +��� /*
>> +��� �* Warning messages.
>> +��� �*/
>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce;
>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy;
>> �};
>> �
>> �struct arm_spe_queue {
>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>> ���� return timeless_decoding;
>> �}
>> �
>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
>> +��� struct evsel *evsel;
>> +��� struct evlist *evlist = spe->session->evlist;
>> +
>> +��� evlist__for_each_entry(evlist, evsel) {
>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
>> +��� ��� return true;
>> +��� }
>> +
>> +��� return false;
>> +}
>> +
>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue)
>> �{
>> ���� struct arm_spe_queue *speq = queue->priv;
>> -��� pid_t tid;
>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
>> +��� u64 context_id = speq->decoder->record.context_id;
>> +
>> +��� /*
>> +��� * We're tracing the kernel.
>> +��� */
>> +��� if (!spe->exclude_kernel) {
> This is incorrect ... 'exclude_kernel' is a global variable and if
> it's set then perf will always run below code.
>
> I think here you want to avoid using contect packet for user space
> samples, but checking 'exclude_kernel' cannot help for this purpose
> since 'exclude_kernel' cannot be used to decide sample mode (kernel
> mode or user mode).
>
> Thanks,
> Leo
>
>> +��� ��� /*
>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the
>> +��� ��� �* values correspond to the root PID namespace.
>> +��� ��� �*
>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user
>> +��� ��� �* of the time inaccuracies.
>> +��� ��� �*/
>> +��� ��� if (context_id != (u64) -1) {
>> +��� ��� ��� tid = speq->decoder->record.context_id;
>> +��� ��� ��� spe->warn_context_pkt_namesapce = true;
>> +��� ��� } else if (tid != -1 && context_id == (u64) -1)
>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true;
>> +��� }
>> �
>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu);
>> ���� if (tid != -1) {
>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>> ���� ��� if (err)
>> ���� ��� ��� return err;
>> �
>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH)
>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample);
>> ���� }
>> �
>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>> ���� ��� return arm_spe_process_timeless_queues(spe, -1,
>> ���� ��� ��� ��� MAX_TIMESTAMP - 1);
>> �
>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
>> +
>> +��� if (spe->warn_context_pkt_namesapce)
>> +��� ��� ui__warning(
>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n");
>> +
>> +��� if (spe->warn_context_switch_ev_accuracy)
>> +��� ��� ui__warning(
>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n"
>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n");
>> +
>> +��� return ret;
>> �}
>> �
>> �static void arm_spe_free_queue(void *priv)
>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>> �
>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>> �
>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
>> +��� spe->warn_context_pkt_namesapce = false;
>> +��� spe->warn_context_switch_ev_accuracy = false;
>> +
>> ���� /*
>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>> ���� �* and the parameters for hardware clock are stored in the session
>> --
>> 2.17.1


Thanks,
German