Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records

From: James Clark
Date: Mon Jun 28 2021 - 06:38:43 EST




On 28/06/2021 02:27, Leo Yan wrote:
> Hi James,
>
> On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
>> Currently --dump-raw-trace skips queueing and splitting buffers because
>> of an early exit condition in cs_etm__process_auxtrace_info(). Once
>> that is removed we can print the split data by using the queues
>> and searching for split buffers with the same reference as the
>> one that is currently being processed.
>>
>> This keeps the same behaviour of dumping in file order when an AUXTRACE
>> event appears, rather than moving trace dump to where AUX records are in
>> the file.
>>
>> There will be a newline and size printout for each fragment. For example
>> this buffer is comprised of two AUX records, but was printed as one:
>>
>> 0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0 offset: 0 ref: 0x491a4dfc52fc0e6e idx: 0 t
>>
>> . ... CoreSight ETM Trace data: size 160 bytes
>> Idx:0; ID:10; I_ASYNC : Alignment Synchronisation.
>> Idx:12; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>> Idx:17; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>> Idx:80; ID:10; I_ASYNC : Alignment Synchronisation.
>> Idx:92; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>> Idx:97; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
>>
>> But is now printed as two fragments:
>>
>> 0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0 offset: 0 ref: 0x491a4dfc52fc0e6e idx: 0 t
>>
>> . ... CoreSight ETM Trace data: size 80 bytes
>> Idx:0; ID:10; I_ASYNC : Alignment Synchronisation.
>> Idx:12; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>> Idx:17; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>>
>> . ... CoreSight ETM Trace data: size 80 bytes
>> Idx:80; ID:10; I_ASYNC : Alignment Synchronisation.
>> Idx:92; ID:10; I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>> Idx:97; ID:10; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
>>
>> Decoding errors that appeared in problematic files are now not present,
>> for example:
>>
>> Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
>> ...
>> PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
>>
>> Signed-off-by: James Clark <james.clark@xxxxxxx>
>
> I tested this patch and the result looks good for me.
>
> I found a side effect introduced by this change is the perf raw dump
> also synthesizes events PERF_RECORD_ATTR. This is because function
> cs_etm__synth_events() will execute after applying this patch and it
> synthesizes PERF_RECORD_ATTR events. I don't see any harm for this,
> so:
>
> Tested-by: Leo Yan <leo.yan@xxxxxxxxxx>
>

Thanks for the testing!

> Please see an extra comment in below.
>
>> ---
>> tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 88e8122f73c9..ad777c2a342f 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -2430,6 +2430,22 @@ static int cs_etm__process_event(struct perf_session *session,
>> return 0;
>> }
>>
>> +static void dump_queued_data(struct cs_etm_auxtrace *etm,
>> + struct perf_record_auxtrace *event)
>> +{
>> + struct auxtrace_buffer *buf;
>> + unsigned int i;
>> + /*
>> + * Find all buffers with same reference in the queues and dump them.
>> + * This is because the queues can contain multiple entries of the same
>> + * buffer that were split on aux records.
>> + */
>> + for (i = 0; i < etm->queues.nr_queues; ++i)
>> + list_for_each_entry(buf, &etm->queues.queue_array[i].head, list)
>> + if (buf->reference == event->reference)
>> + cs_etm__dump_event(etm, buf);
>> +}
>> +
>> static int cs_etm__process_auxtrace_event(struct perf_session *session,
>> union perf_event *event,
>> struct perf_tool *tool __maybe_unused)
>> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
>> cs_etm__dump_event(etm, buffer);
>> auxtrace_buffer__put_data(buffer);
>> }
>> - }
>> + } else if (dump_trace)
>> + dump_queued_data(etm, &event->auxtrace);
>
> IIUC, in the function cs_etm__process_auxtrace_event(), since
> "etm->data_queued" is always true, below flow will never run:
>
> if (!etm->data_queued) {
> ......
>
> if (dump_trace)
> if (auxtrace_buffer__get_data(buffer, fd)) {
> cs_etm__dump_event(etm, buffer);
> auxtrace_buffer__put_data(buffer);
> }
> }
>
> If so, it's better to use a new patch to polish the code.
>

Hi Leo,

I think this is not true in piped mode because there is no auxtrace index.
In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
is called for each buffer.

You can reproduce this with something like this:

./perf record -o - ls > stdio.data
cat stdio.data | ./perf report -i -

There are some other Coresight features that don't work as expected in this mode, like
sorting timestamps between CPUs. The aux split patchset won't work either because random
access isn't possible. And the TRBE patch that I'm working on now won't work, because it
also requires the random access to lookup the flags on the AUX record to configure the
decoder for unformatted trace.


Thanks
James

> Thanks,
> Leo
>
>>
>> return 0;
>> }
>> @@ -3038,7 +3055,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>>
>> if (dump_trace) {
>> cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
>> - return 0;
>> }
>>
>> err = cs_etm__synth_events(etm, session);
>> --
>> 2.28.0
>>