Re: [PATCH] perf stat: Support per-cluster aggregation
From: Yicong Yang
Date: Mon Mar 27 2023 - 00:04:46 EST
Hi Tim,
On 2023/3/25 2:05, Chen, Tim C wrote:
>>
>> From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>>
>> Some platforms have 'cluster' topology and CPUs in the cluster will share
>> resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel
>> Jacobsville). Currently parsing and building cluster topology have been
>> supported since [1].
>>
>> perf stat has already supported aggregation for other topologies like die or
>> socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T
>> bandwidth contention or imbalance.
>>
>> This patch adds support for "--per-cluster" option for per-cluster aggregation.
>> Also update the docs and related test. The output will be like:
>>
>> [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S56-D0-CLS158 4 1,321,521,570 LLC-load
>> S56-D0-CLS594 4 794,211,453 LLC-load
>> S56-D0-CLS1030 4 41,623 LLC-load
>> S56-D0-CLS1466 4 41,646 LLC-load
>> S56-D0-CLS1902 4 16,863 LLC-load
>> S56-D0-CLS2338 4 15,721 LLC-load
>> S56-D0-CLS2774 4 22,671 LLC-load
>> [...]
>
> Overall it looks good. You can add my reviewed-by.
>
thanks.
> I wonder if we could enhance the help message
> in perf stat to tell user to refer to
> /sys/devices/system/cpu/cpuX/topology/*_id
> to map relevant ids back to overall cpu topology.
>
> For example the above example, cluster S56-D0-CLS158 has
> really heavy load. It took me a while
> going through the code to figure out how to find
> the info that maps cluster id to cpu.
>
yes, indeed. Actually this is because my bios doesn't report a valid
ID for these topologies so the ACPI use the offset of the topology
node in the PPTT as a fallback. Other topologies also suffers the same:
On my machine:
[root@localhost debug]# perf stat --per-socket -e cycles -a -- sleep 1
Performance counter stats for 'system wide':
S56 64 21,563,375 cycles
S7182 64 32,140,641 cycles
1.008520310 seconds time elapsed
On x86:
root@ubuntu204:/home/yang/linux/tools/perf# ./perf stat -a --per-socket -e cycles -- sleep 1
Performance counter stats for 'system wide':
S0 40 137,205,897 cycles
S1 40 67,720,731 cycles
1.003546720 seconds time elapsed
Maybe I can try to add a separate patch for describing the source of the
topology ids in the perf manual.
Thanks,
Yicong