Re: [PATCH 1/2] perf arm-spe: Add arm_spe_record to synthesized

From: German Gomez
Date: Wed Jan 26 2022 - 14:08:23 EST



On 26/01/2022 15:58, Ali Saidi wrote:
> Hi German,
>> Hi Ali,
>>
>> On 25/01/2022 19:20, Ali Saidi wrote:
>>> [...]
>>>
>>> + sample.raw_size = sizeof(*record);
>>> + sample.raw_data = record;
>> Have you tried this with perf-inject? I think it would need the PERF_SAMPLE_RAW bit in the sample_type,
> Yes I've tried the following and it worked as expected with the original
> perf.data or the perf.data.jitted after perf-inject.
>
> perf record -e arm_spe_0/jitter=1/ -k 1 java ...
> perf inject -f --jit -i perf.data -o perf.data.jitted

This is not injecting the synthesized samples. I think it is still    
processing from the aux trace. Try adding "--itrace=i1i --strip" to the
inject command to remove the AUXTRACE events. Judging by the raw
samples, the data is missing:

11 20005239445831 0x3510 [0x94]: PERF_RECORD_SAMPLE(IP, 0x1): 47670/47670: 0xffffab6e02d1b320 period: 1 addr: 0

. ... raw event: size 64 bytes
.  0000:  09 00 00 00 01 00 40 00 f3 10 9b 3b 00 00 00 00  ......@....;....
.  0010:  98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00  ....n...6...6...
.  0020:  35 da 30 d5 31 12 00 00 0b 00 00 00 00 00 00 00  5.0.1...........
.  0030:  01 00 00 00 00 00 00 00 82 01 00 88 00 00 00 00  ................

vs when adding PERF_SAMPLE_RAW to attr.sample_type:      

. ... raw event: size 148 bytes
.  0000:  09 00 00 00 01 00 94 00 f3 10 9b 3b 00 00 00 00  ...........;....
.  0010:  98 c5 e3 02 6e ab ff ff 36 ba 00 00 36 ba 00 00  ....n...6...6...
[...]
.  0080:  00 00 00 00 2a 00 00 00 00 00 00 00 82 01 00 88  ....*...........
.  0090:  00 00 00 00

> perf script -i perf.data -s t1.py --itrace=i1i
>
>> Although I quickly looked over the perf inject code and it looks like it's expecting some type of padding:
>>
>> [...]
>>
>> I'm seeing some comments in utils/event.h related to this on the intel events.
> Yes i noticed this too,but looking at how the raw data is added to the same
> other places like intel-pt.c:1703 the perf_synth__raw*() functions are used to
> strip away the 4 bytes bytes before the data is added to the sample. The other
> places i can find the padding used is in builtin-script.c but given we have the
> --dump-raw-trace option it's not clear to me that it's needed to wrap the
> arm_spe_event in another struct with padding like perf_synth_intel_ptwrite?

I think the intel use case makes sense because the layout of the data
is fixed and documented. If we modify the struct arm_spe_record later it
may not be obvious how to match it to the raw data of an older perf.data
file. And we're generating bigger files with redundant information.

Thanks,
German

>
> Thanks,
> Ali
>