Re: [PATCH] perf stat: Support per-cluster aggregation

From: Namhyung Kim
Date: Wed Mar 29 2023 - 02:47:41 EST


Hello,

On Fri, Mar 24, 2023 at 11:09 AM Chen, Tim C <tim.c.chen@xxxxxxxxx> wrote:
>
> >
> >From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
> >
> >Some platforms have 'cluster' topology and CPUs in the cluster will share
> >resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel
> >Jacobsville). Currently parsing and building cluster topology have been
> >supported since [1].
> >
> >perf stat has already supported aggregation for other topologies like die or
> >socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T
> >bandwidth contention or imbalance.
> >
> >This patch adds support for "--per-cluster" option for per-cluster aggregation.
> >Also update the docs and related test. The output will be like:
> >
> >[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
> >
> > Performance counter stats for 'system wide':
> >
> >S56-D0-CLS158 4 1,321,521,570 LLC-load
> >S56-D0-CLS594 4 794,211,453 LLC-load
> >S56-D0-CLS1030 4 41,623 LLC-load
> >S56-D0-CLS1466 4 41,646 LLC-load
> >S56-D0-CLS1902 4 16,863 LLC-load
> >S56-D0-CLS2338 4 15,721 LLC-load
> >S56-D0-CLS2774 4 22,671 LLC-load
> >[...]
>
> Overall it looks good. You can add my reviewed-by.
>
> I wonder if we could enhance the help message
> in perf stat to tell user to refer to
> /sys/devices/system/cpu/cpuX/topology/*_id
> to map relevant ids back to overall cpu topology.
>
> For example the above example, cluster S56-D0-CLS158 has
> really heavy load. It took me a while
> going through the code to figure out how to find
> the info that maps cluster id to cpu.

Maybe we could enhance the cpu filter to accept something
like -C S56-D0-CLS158.

I also wonder what if it runs on an old kernel which doesn't
have the cluster_id file.

Thanks,
Namhyung