Re: [RFC] perf arm-spe: Track task context switch for cpu-mode events

From: German Gomez
Date: Fri Oct 29 2021 - 06:51:26 EST


Hi Leo,

The current plan is to define a global flag in the `struct arm_spe` to
select the method of pid tracing (context pkt, or switch events):

    struct arm_spe {
       /* ... */
       u8        use_ctx_pkt_for_pid;
    }

The method could be determined by peeking at the top element of the
`struct auxtrace_heap` at the beginning of the perf-report. If ctx
packets have been collected, the first one should have a context_id !=
-1. We could then tweak this part of Namhyung patch slightly:

    if (!spe->use_ctx_pkt_for_pid &&
        (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
         event->header.type == PERF_RECORD_SWITCH))
            err = arm_spe_context_switch(spe, event, sample);

Then we could apply patch [1] which wasn't fully merged in the end,
including similar `if (spe->use_ctx_pkt_for_pid)` to collect the pid/tid
from the context packets.

What do you think?

Thanks,
German

[1] https://lore.kernel.org/lkml/20210119144658.793-8-james.clark@xxxxxxx/

On 19/10/2021 13:21, German Gomez wrote:
> Hi Leo,
>
> Many thanks for you comments as always and sorry for the rushed patch.
>
> On 18/10/2021 14:23, Leo Yan wrote:
>> Hi German,
>>
>> On Mon, Oct 18, 2021 at 12:01:27PM +0100, German Gomez wrote:
>>> Hi,
>>>
>>> What do you thing of the patch below? PERF_RECORD_SWITCH events are also
>>> included for tracing forks. The patch would sit on top of Namhyung's.
>> Yeah, it's good to add PERF_RECORD_SWITCH.
>>
>>> On 12/10/2021 12:07, German Gomez wrote:
>>>> Hi, Leo and Namhyung,
>>>>
>>>> I want to make sure I'm on the same page as you regarding this topic.
>>>>
>>>> [...]
>>>>
>>>> If we are not considering patching the driver at this stage, so we allow
>>>> hardware tracing on non-root namespaces. I think we could proceed like
>>>> this:
>>>>
>>>> � - For userspace, always use context-switch events as they are
>>>> ��� accurate and consistent with namespaces.
>> I don't think you can always use context-switch events for userspace
>> samples. The underlying mechanism is when there have context-switch
>> event or context packet is coming, it will invoke the function
>> machine__set_current_tid() to set current pid/tid; afterwards, we
>> can retrieve the current pid/tid with the function
>> arm_spe_set_pid_tid_cpu().
>>
>> The question is that if we want to use the tid/pid info at the same
>> time for both context-switch events and context packets, then it's
>> hard to maintain. E.g. we need to create multiple thread context, one
>> is used to track pid info coming from context-switch events and
>> another context is to track pid info from context packet.
> My thinking was to use only one of the methods for the entire run, but
> the code below is not expressive enough I'm afraid and I agree it could
> become hard to maintain. I need to polish it up.
>
>> To simplify the code, I still think we give context packet priority and
>> use it if it's avalible. And we rollback to use context-switch events
>> for pid/tid when context packet is not avaliable.
> OK if it simplifies things. I think context-pkt availability can be
> determined before any events are processed by looking at the top record
> in the auxtrace_heap, o any of the auxtrace_queues.
>
>>>> � - For kernel tracing, if context packets are enabled, use them, but
>>>> ��� warn the user that the PIDs correspond to the root namespace.
>>>> � - Otherwise, use context-switch events and warn the user of the time
>>>> ��� inaccuracies.
>>>>
>>>> Later, if the driver is patched to disable context packets outside the
>>>> root namespace, kernel tracing could fall back to using context-switch
>>>> events and warn the user with a single message about the time
>>>> inaccuracies.
>>>>
>>>> If we are aligned, we could collect your feedback and share an updated
>>>> patch that considers the warnings.
>>>>
>>>> Many thanks
>>>> Best regards
>>> ---
>>> �tools/perf/util/arm-spe.c | 66 +++++++++++++++++++++++++++++++++++++--
>>> �1 file changed, 63 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>>> index 708323d7c93c..6a2f7a484a80 100644
>>> --- a/tools/perf/util/arm-spe.c
>>> +++ b/tools/perf/util/arm-spe.c
>>> @@ -71,6 +71,17 @@ struct arm_spe {
>>> ���� u64��� ��� ��� ��� kernel_start;
>>> �
>>> ���� unsigned long��� ��� ��� num_events;
>>> +
>>> +��� /*
>>> +��� �* Used for PID tracing.
>>> +��� �*/
>>> +��� u8��� ��� ��� ��� exclude_kernel;
>>> +
>>> +��� /*
>>> +��� �* Warning messages.
>>> +��� �*/
>>> +��� u8��� ��� ��� ��� warn_context_pkt_namesapce;
>>> +��� u8��� ��� ��� ��� warn_context_switch_ev_accuracy;
>>> �};
>>> �
>>> �struct arm_spe_queue {
>>> @@ -586,11 +597,42 @@ static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
>>> ���� return timeless_decoding;
>>> �}
>>> �
>>> +static bool arm_spe__is_exclude_kernel(struct arm_spe *spe) {
>>> +��� struct evsel *evsel;
>>> +��� struct evlist *evlist = spe->session->evlist;
>>> +
>>> +��� evlist__for_each_entry(evlist, evsel) {
>>> +��� if (evsel->core.attr.type == spe->pmu_type && evsel->core.attr.exclude_kernel)
>>> +��� ��� return true;
>>> +��� }
>>> +
>>> +��� return false;
>>> +}
>>> +
>>> �static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
>>> ���� ��� ��� ��� ��� struct auxtrace_queue *queue)
>>> �{
>>> ���� struct arm_spe_queue *speq = queue->priv;
>>> -��� pid_t tid;
>>> +��� pid_t tid = machine__get_current_tid(spe->machine, speq->cpu);
>>> +��� u64 context_id = speq->decoder->record.context_id;
>>> +
>>> +��� /*
>>> +��� * We're tracing the kernel.
>>> +��� */
>>> +��� if (!spe->exclude_kernel) {
>> This is incorrect ... 'exclude_kernel' is a global variable and if
>> it's set then perf will always run below code.
>>
>> I think here you want to avoid using contect packet for user space
>> samples, but checking 'exclude_kernel' cannot help for this purpose
>> since 'exclude_kernel' cannot be used to decide sample mode (kernel
>> mode or user mode).
>>
>> Thanks,
>> Leo
>>
>>> +��� ��� /*
>>> +��� ��� �* Use CONTEXT packets in kernel tracing if available and warn the user of the
>>> +��� ��� �* values correspond to the root PID namespace.
>>> +��� ��� �*
>>> +��� ��� �* If CONTEXT packets aren't available but context-switch events are, warn the user
>>> +��� ��� �* of the time inaccuracies.
>>> +��� ��� �*/
>>> +��� ��� if (context_id != (u64) -1) {
>>> +��� ��� ��� tid = speq->decoder->record.context_id;
>>> +��� ��� ��� spe->warn_context_pkt_namesapce = true;
>>> +��� ��� } else if (tid != -1 && context_id == (u64) -1)
>>> +��� ��� ��� spe->warn_context_switch_ev_accuracy = true;
>>> +��� }
>>> �
>>> ���� tid = machine__get_current_tid(spe->machine, speq->cpu);
>>> ���� if (tid != -1) {
>>> @@ -740,7 +782,8 @@ static int arm_spe_process_event(struct perf_session *session,
>>> ���� ��� if (err)
>>> ���� ��� ��� return err;
>>> �
>>> -��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
>>> +��� ��� if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE ||
>>> +��� ��� ��� event->header.type == PERF_RECORD_SWITCH)
>>> ���� ��� ��� err = arm_spe_context_switch(spe, event, sample);
>>> ���� }
>>> �
>>> @@ -807,7 +850,20 @@ static int arm_spe_flush(struct perf_session *session __maybe_unused,
>>> ���� ��� return arm_spe_process_timeless_queues(spe, -1,
>>> ���� ��� ��� ��� MAX_TIMESTAMP - 1);
>>> �
>>> -��� return arm_spe_process_queues(spe, MAX_TIMESTAMP);
>>> +��� ret = arm_spe_process_queues(spe, MAX_TIMESTAMP);
>>> +
>>> +��� if (spe->warn_context_pkt_namesapce)
>>> +��� ��� ui__warning(
>>> +��� ��� ��� "Arm SPE CONTEXT packets used for PID/TID tracing.\n\n"
>>> +��� ��� ��� "PID values correspond to the root PID namespace.\n\n");
>>> +
>>> +��� if (spe->warn_context_switch_ev_accuracy)
>>> +��� ��� ui__warning(
>>> +��� ��� ��� "No Arm SPE CONTEXT packets found within traces.\n\n"
>>> +��� ��� ��� "Fallback to PERF_RECORD_SWITCH events for PID/TID tracing will have\n"
>>> +��� ��� ��� "workload-dependant timing inaccuracies.\n\n");
>>> +
>>> +��� return ret;
>>> �}
>>> �
>>> �static void arm_spe_free_queue(void *priv)
>>> @@ -1083,6 +1139,10 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
>>> �
>>> ���� spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
>>> �
>>> +��� spe->exclude_kernel = arm_spe__is_exclude_kernel(spe);
>>> +��� spe->warn_context_pkt_namesapce = false;
>>> +��� spe->warn_context_switch_ev_accuracy = false;
>>> +
>>> ���� /*
>>> ���� �* The synthesized event PERF_RECORD_TIME_CONV has been handled ahead
>>> ���� �* and the parameters for hardware clock are stored in the session
>>> --
>>> 2.17.1
>
> Thanks,
> German
>