Re: [RESEND v4] perf stat: Support per-cluster aggregation

From: Yicong Yang
Date: Tue Feb 06 2024 - 04:39:17 EST


On 2024/2/6 17:16, Jonathan Cameron wrote:
> On Tue, 6 Feb 2024 16:20:16 +0800
> Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
>
>> From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>>
>> Some platforms have 'cluster' topology and CPUs in the cluster will
>> share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
>> cache (for Intel Jacobsville). Currently parsing and building cluster
>> topology have been supported since [1].
>>
>> perf stat has already supported aggregation for other topologies like
>> die or socket, etc. It'll be useful to aggregate per-cluster to find
>> problems like L3T bandwidth contention.
>>
>> This patch add support for "--per-cluster" option for per-cluster
>> aggregation. Also update the docs and related test. The output will
>> be like:
>>
>> [root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
>>
>> Performance counter stats for 'system wide':
>>
>> S56-D0-CLS158 4 1,321,521,570 LLC-load
>> S56-D0-CLS594 4 794,211,453 LLC-load
>> S56-D0-CLS1030 4 41,623 LLC-load
>> S56-D0-CLS1466 4 41,646 LLC-load
>> S56-D0-CLS1902 4 16,863 LLC-load
>> S56-D0-CLS2338 4 15,721 LLC-load
>> S56-D0-CLS2774 4 22,671 LLC-load
>> [...]
>>
>> On a legacy system without cluster or cluster support, the output will
>> be look like:
>> [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
>>
>> Performance counter stats for 'system wide':
>>
>> S56-D0-CLS0 64 18,011,485 cycles
>> S7182-D0-CLS0 64 16,548,835 cycles
>>
>> Note that this patch doesn't mix the cluster information in the outputs
>> of --per-core to avoid breaking any tools/scripts using it.
>>
>> Note that perf recently supports "--per-cache" aggregation, but it's not
>> the same with the cluster although cluster CPUs may share some cache
>> resources. For example on my machine all clusters within a die share the
>> same L3 cache:
>> $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
>> 0-31
>> $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
>> 0-3
>>
>> [1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")
>> Tested-by: Jie Zhan <zhanjie9@xxxxxxxxxxxxx>
>> Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
>> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>> ---
>> Change since v3:
>> - Rebase on v6.7-rc4 and resolve the conflicts
>> Link: https://lore.kernel.org/all/20230404104951.27537-1-yangyicong@xxxxxxxxxx/
>>
>> Change since v2:
>> - Use 0 as cluster ID on legacy system without cluster support, keep consistenct
>> with what --per-die does.
>> Link: https://lore.kernel.org/all/20230328112717.19573-1-yangyicong@xxxxxxxxxx/
>>
>> Change since v1:
>> - Provides the information about how to map the cluster to the CPUs in the manual
> Given this change incorporates both the case in the example above where PPTT doesn't
> have the IDs set for the Processor Hierarchy nodes and the one where it does
> (which would be the UIDs of the Processor Containers in DSDT) I think this is sufficient.
>

Yes the perf only read the IDs from sysfs and if the firmware fill the ID more meaningfully
we should get a more readable output like:
[root@localhost yang]# ./perf stat -e ll_cache_miss --per-cluster --timeout 1000

Performance counter stats for 'system wide':

S0-D0-CLS0 4 944838 ll_cache_miss
S0-D0-CLS1 4 4067300 ll_cache_miss
S0-D0-CLS2 4 14804515 ll_cache_miss
S0-D0-CLS3 4 11664175 ll_cache_miss
S0-D0-CLS4 4 11301486 ll_cache_miss
S0-D0-CLS5 4 11300000 ll_cache_miss
S0-D0-CLS6 4 1996552 ll_cache_miss
[...]

> Not an expert on perftool but in general this looks both useful and correct to me.
>

Thanks.

> Jonathan
>
>> - Thanks the review from Tim and test from Jie.
>> Link: https://lore.kernel.org/all/20230313085911.61359-1-yangyicong@xxxxxxxxxx/
>>
>> tools/perf/Documentation/perf-stat.txt | 11 ++++
>> tools/perf/builtin-stat.c | 52 +++++++++++++++++--
>> .../tests/shell/lib/perf_json_output_lint.py | 4 +-
>> tools/perf/tests/shell/lib/stat_output.sh | 12 +++++
>> tools/perf/tests/shell/stat+csv_output.sh | 2 +
>> tools/perf/tests/shell/stat+json_output.sh | 13 +++++
>> tools/perf/tests/shell/stat+std_output.sh | 2 +
>> tools/perf/util/cpumap.c | 32 +++++++++++-
>> tools/perf/util/cpumap.h | 19 +++++--
>> tools/perf/util/env.h | 1 +
>> tools/perf/util/stat-display.c | 13 +++++
>> tools/perf/util/stat.h | 1 +
>> 12 files changed, 153 insertions(+), 9 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
>> index 5af2e432b54f..29756a87ab6f 100644
>> --- a/tools/perf/Documentation/perf-stat.txt
>> +++ b/tools/perf/Documentation/perf-stat.txt
>> @@ -308,6 +308,14 @@ use --per-die in addition to -a. (system-wide). The output includes the
>> die number and the number of online processors on that die. This is
>> useful to gauge the amount of aggregation.
>>
>> +--per-cluster::
>> +Aggregate counts per processor cluster for system-wide mode measurement. This
>> +is a useful mode to detect imbalance between clusters. To enable this mode,
>> +use --per-cluster in addition to -a. (system-wide). The output includes the
>> +cluster number and the number of online processors on that cluster. This is
>> +useful to gauge the amount of aggregation. The information of cluster ID and
>> +related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}.
>> +
>> --per-cache::
>> Aggregate counts per cache instance for system-wide mode measurements. By
>> default, the aggregation happens for the cache level at the highest index
>> @@ -396,6 +404,9 @@ Aggregate counts per processor socket for system-wide mode measurements.
>> --per-die::
>> Aggregate counts per processor die for system-wide mode measurements.
>>
>> +--per-cluster::
>> +Aggregate counts perf processor cluster for system-wide mode measurements.
>> +
>> --per-cache::
>> Aggregate counts per cache instance for system-wide mode measurements. By
>> default, the aggregation happens for the cache level at the highest index
>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>> index 5fe9abc6a524..6bba1a89d030 100644
>> --- a/tools/perf/builtin-stat.c
>> +++ b/tools/perf/builtin-stat.c
>> @@ -1238,6 +1238,8 @@ static struct option stat_options[] = {
>> "aggregate counts per processor socket", AGGR_SOCKET),
>> OPT_SET_UINT(0, "per-die", &stat_config.aggr_mode,
>> "aggregate counts per processor die", AGGR_DIE),
>> + OPT_SET_UINT(0, "per-cluster", &stat_config.aggr_mode,
>> + "aggregate counts per processor cluster", AGGR_CLUSTER),
>> OPT_CALLBACK_OPTARG(0, "per-cache", &stat_config.aggr_mode, &stat_config.aggr_level,
>> "cache level", "aggregate count at this cache level (Default: LLC)",
>> parse_cache_level),
>> @@ -1428,6 +1430,7 @@ static struct aggr_cpu_id aggr_cpu_id__cache(struct perf_cpu cpu, void *data)
>> static const char *const aggr_mode__string[] = {
>> [AGGR_CORE] = "core",
>> [AGGR_CACHE] = "cache",
>> + [AGGR_CLUSTER] = "cluster",
>> [AGGR_DIE] = "die",
>> [AGGR_GLOBAL] = "global",
>> [AGGR_NODE] = "node",
>> @@ -1455,6 +1458,12 @@ static struct aggr_cpu_id perf_stat__get_cache_id(struct perf_stat_config *confi
>> return aggr_cpu_id__cache(cpu, /*data=*/NULL);
>> }
>>
>> +static struct aggr_cpu_id perf_stat__get_cluster(struct perf_stat_config *config __maybe_unused,
>> + struct perf_cpu cpu)
>> +{
>> + return aggr_cpu_id__cluster(cpu, /*data=*/NULL);
>> +}
>> +
>> static struct aggr_cpu_id perf_stat__get_core(struct perf_stat_config *config __maybe_unused,
>> struct perf_cpu cpu)
>> {
>> @@ -1507,6 +1516,12 @@ static struct aggr_cpu_id perf_stat__get_die_cached(struct perf_stat_config *con
>> return perf_stat__get_aggr(config, perf_stat__get_die, cpu);
>> }
>>
>> +static struct aggr_cpu_id perf_stat__get_cluster_cached(struct perf_stat_config *config,
>> + struct perf_cpu cpu)
>> +{
>> + return perf_stat__get_aggr(config, perf_stat__get_cluster, cpu);
>> +}
>> +
>> static struct aggr_cpu_id perf_stat__get_cache_id_cached(struct perf_stat_config *config,
>> struct perf_cpu cpu)
>> {
>> @@ -1544,6 +1559,8 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>> return aggr_cpu_id__socket;
>> case AGGR_DIE:
>> return aggr_cpu_id__die;
>> + case AGGR_CLUSTER:
>> + return aggr_cpu_id__cluster;
>> case AGGR_CACHE:
>> return aggr_cpu_id__cache;
>> case AGGR_CORE:
>> @@ -1569,6 +1586,8 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
>> return perf_stat__get_socket_cached;
>> case AGGR_DIE:
>> return perf_stat__get_die_cached;
>> + case AGGR_CLUSTER:
>> + return perf_stat__get_cluster_cached;
>> case AGGR_CACHE:
>> return perf_stat__get_cache_id_cached;
>> case AGGR_CORE:
>> @@ -1737,6 +1756,21 @@ static struct aggr_cpu_id perf_env__get_cache_aggr_by_cpu(struct perf_cpu cpu,
>> return id;
>> }
>>
>> +static struct aggr_cpu_id perf_env__get_cluster_aggr_by_cpu(struct perf_cpu cpu,
>> + void *data)
>> +{
>> + struct perf_env *env = data;
>> + struct aggr_cpu_id id = aggr_cpu_id__empty();
>> +
>> + if (cpu.cpu != -1) {
>> + id.socket = env->cpu[cpu.cpu].socket_id;
>> + id.die = env->cpu[cpu.cpu].die_id;
>> + id.cluster = env->cpu[cpu.cpu].cluster_id;
>> + }
>> +
>> + return id;
>> +}
>> +
>> static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, void *data)
>> {
>> struct perf_env *env = data;
>> @@ -1744,12 +1778,12 @@ static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, vo
>>
>> if (cpu.cpu != -1) {
>> /*
>> - * core_id is relative to socket and die,
>> - * we need a global id. So we set
>> - * socket, die id and core id
>> + * core_id is relative to socket, die and cluster, we need a
>> + * global id. So we set socket, die id, cluster id and core id.
>> */
>> id.socket = env->cpu[cpu.cpu].socket_id;
>> id.die = env->cpu[cpu.cpu].die_id;
>> + id.cluster = env->cpu[cpu.cpu].cluster_id;
>> id.core = env->cpu[cpu.cpu].core_id;
>> }
>>
>> @@ -1805,6 +1839,12 @@ static struct aggr_cpu_id perf_stat__get_die_file(struct perf_stat_config *confi
>> return perf_env__get_die_aggr_by_cpu(cpu, &perf_stat.session->header.env);
>> }
>>
>> +static struct aggr_cpu_id perf_stat__get_cluster_file(struct perf_stat_config *config __maybe_unused,
>> + struct perf_cpu cpu)
>> +{
>> + return perf_env__get_cluster_aggr_by_cpu(cpu, &perf_stat.session->header.env);
>> +}
>> +
>> static struct aggr_cpu_id perf_stat__get_cache_file(struct perf_stat_config *config __maybe_unused,
>> struct perf_cpu cpu)
>> {
>> @@ -1842,6 +1882,8 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
>> return perf_env__get_socket_aggr_by_cpu;
>> case AGGR_DIE:
>> return perf_env__get_die_aggr_by_cpu;
>> + case AGGR_CLUSTER:
>> + return perf_env__get_cluster_aggr_by_cpu;
>> case AGGR_CACHE:
>> return perf_env__get_cache_aggr_by_cpu;
>> case AGGR_CORE:
>> @@ -1867,6 +1909,8 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
>> return perf_stat__get_socket_file;
>> case AGGR_DIE:
>> return perf_stat__get_die_file;
>> + case AGGR_CLUSTER:
>> + return perf_stat__get_cluster_file;
>> case AGGR_CACHE:
>> return perf_stat__get_cache_file;
>> case AGGR_CORE:
>> @@ -2398,6 +2442,8 @@ static int __cmd_report(int argc, const char **argv)
>> "aggregate counts per processor socket", AGGR_SOCKET),
>> OPT_SET_UINT(0, "per-die", &perf_stat.aggr_mode,
>> "aggregate counts per processor die", AGGR_DIE),
>> + OPT_SET_UINT(0, "per-cluster", &perf_stat.aggr_mode,
>> + "aggregate counts perf processor cluster", AGGR_CLUSTER),
>> OPT_CALLBACK_OPTARG(0, "per-cache", &perf_stat.aggr_mode, &perf_stat.aggr_level,
>> "cache level",
>> "aggregate count at this cache level (Default: LLC)",
>> diff --git a/tools/perf/tests/shell/lib/perf_json_output_lint.py b/tools/perf/tests/shell/lib/perf_json_output_lint.py
>> index ea55d5ea1ced..abc1fd737782 100644
>> --- a/tools/perf/tests/shell/lib/perf_json_output_lint.py
>> +++ b/tools/perf/tests/shell/lib/perf_json_output_lint.py
>> @@ -15,6 +15,7 @@ ap.add_argument('--event', action='store_true')
>> ap.add_argument('--per-core', action='store_true')
>> ap.add_argument('--per-thread', action='store_true')
>> ap.add_argument('--per-cache', action='store_true')
>> +ap.add_argument('--per-cluster', action='store_true')
>> ap.add_argument('--per-die', action='store_true')
>> ap.add_argument('--per-node', action='store_true')
>> ap.add_argument('--per-socket', action='store_true')
>> @@ -49,6 +50,7 @@ def check_json_output(expected_items):
>> 'cgroup': lambda x: True,
>> 'cpu': lambda x: isint(x),
>> 'cache': lambda x: True,
>> + 'cluster': lambda x: True,
>> 'die': lambda x: True,
>> 'event': lambda x: True,
>> 'event-runtime': lambda x: isfloat(x),
>> @@ -88,7 +90,7 @@ try:
>> expected_items = 7
>> elif args.interval or args.per_thread or args.system_wide_no_aggr:
>> expected_items = 8
>> - elif args.per_core or args.per_socket or args.per_node or args.per_die or args.per_cache:
>> + elif args.per_core or args.per_socket or args.per_node or args.per_die or args.per_cluster or args.per_cache:
>> expected_items = 9
>> else:
>> # If no option is specified, don't check the number of items.
>> diff --git a/tools/perf/tests/shell/lib/stat_output.sh b/tools/perf/tests/shell/lib/stat_output.sh
>> index 3cc158a64326..c81d6a9f7983 100644
>> --- a/tools/perf/tests/shell/lib/stat_output.sh
>> +++ b/tools/perf/tests/shell/lib/stat_output.sh
>> @@ -97,6 +97,18 @@ check_per_cache_instance()
>> echo "[Success]"
>> }
>>
>> +check_per_cluster()
>> +{
>> + echo -n "Checking $1 output: per cluster "
>> + if ParanoidAndNotRoot 0
>> + then
>> + echo "[Skip] paranoid and not root"
>> + return
>> + fi
>> + perf stat --per-cluster -a $2 true
>> + echo "[Success]"
>> +}
>> +
>> check_per_die()
>> {
>> echo -n "Checking $1 output: per die "
>> diff --git a/tools/perf/tests/shell/stat+csv_output.sh b/tools/perf/tests/shell/stat+csv_output.sh
>> index f1818fa6d9ce..fc2d8cc6e5e0 100755
>> --- a/tools/perf/tests/shell/stat+csv_output.sh
>> +++ b/tools/perf/tests/shell/stat+csv_output.sh
>> @@ -42,6 +42,7 @@ function commachecker()
>> ;; "--per-socket") exp=8
>> ;; "--per-node") exp=8
>> ;; "--per-die") exp=8
>> + ;; "--per-cluster") exp=8
>> ;; "--per-cache") exp=8
>> esac
>>
>> @@ -79,6 +80,7 @@ then
>> check_system_wide_no_aggr "CSV" "$perf_cmd"
>> check_per_core "CSV" "$perf_cmd"
>> check_per_cache_instance "CSV" "$perf_cmd"
>> + check_per_cluster "CSV" "$perf_cmd"
>> check_per_die "CSV" "$perf_cmd"
>> check_per_socket "CSV" "$perf_cmd"
>> else
>> diff --git a/tools/perf/tests/shell/stat+json_output.sh b/tools/perf/tests/shell/stat+json_output.sh
>> index 3bc900533a5d..2b9c6212dffc 100755
>> --- a/tools/perf/tests/shell/stat+json_output.sh
>> +++ b/tools/perf/tests/shell/stat+json_output.sh
>> @@ -122,6 +122,18 @@ check_per_cache_instance()
>> echo "[Success]"
>> }
>>
>> +check_per_cluster()
>> +{
>> + echo -n "Checking json output: per cluster "
>> + if ParanoidAndNotRoot 0
>> + then
>> + echo "[Skip] paranoia and not root"
>> + return
>> + fi
>> + perf stat -j --per-cluster -a true 2>&1 | $PYTHON $pythonchecker --per-cluster
>> + echo "[Success]"
>> +}
>> +
>> check_per_die()
>> {
>> echo -n "Checking json output: per die "
>> @@ -200,6 +212,7 @@ then
>> check_system_wide_no_aggr
>> check_per_core
>> check_per_cache_instance
>> + check_per_cluster
>> check_per_die
>> check_per_socket
>> else
>> diff --git a/tools/perf/tests/shell/stat+std_output.sh b/tools/perf/tests/shell/stat+std_output.sh
>> index 4fcdd1a9142c..16f61e86afc5 100755
>> --- a/tools/perf/tests/shell/stat+std_output.sh
>> +++ b/tools/perf/tests/shell/stat+std_output.sh
>> @@ -40,6 +40,7 @@ function commachecker()
>> ;; "--per-node") prefix=3
>> ;; "--per-die") prefix=3
>> ;; "--per-cache") prefix=3
>> + ;; "--per-cluster") prefix=3
>> esac
>>
>> while read line
>> @@ -99,6 +100,7 @@ then
>> check_system_wide_no_aggr "STD" "$perf_cmd"
>> check_per_core "STD" "$perf_cmd"
>> check_per_cache_instance "STD" "$perf_cmd"
>> + check_per_cluster "STD" "$perf_cmd"
>> check_per_die "STD" "$perf_cmd"
>> check_per_socket "STD" "$perf_cmd"
>> else
>> diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
>> index 0581ee0fa5f2..5907456d42a2 100644
>> --- a/tools/perf/util/cpumap.c
>> +++ b/tools/perf/util/cpumap.c
>> @@ -222,6 +222,8 @@ static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer)
>> return a->socket - b->socket;
>> else if (a->die != b->die)
>> return a->die - b->die;
>> + else if (a->cluster != b->cluster)
>> + return a->cluster - b->cluster;
>> else if (a->cache_lvl != b->cache_lvl)
>> return a->cache_lvl - b->cache_lvl;
>> else if (a->cache != b->cache)
>> @@ -309,6 +311,29 @@ struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data)
>> return id;
>> }
>>
>> +int cpu__get_cluster_id(struct perf_cpu cpu)
>> +{
>> + int value, ret = cpu__get_topology_int(cpu.cpu, "cluster_id", &value);
>> + return ret ?: value;
>> +}
>> +
>> +struct aggr_cpu_id aggr_cpu_id__cluster(struct perf_cpu cpu, void *data)
>> +{
>> + int cluster = cpu__get_cluster_id(cpu);
>> + struct aggr_cpu_id id;
>> +
>> + /* There is no cluster_id on legacy system. */
>> + if (cluster == -1)
>> + cluster = 0;
>> +
>> + id = aggr_cpu_id__die(cpu, data);
>> + if (aggr_cpu_id__is_empty(&id))
>> + return id;
>> +
>> + id.cluster = cluster;
>> + return id;
>> +}
>> +
>> int cpu__get_core_id(struct perf_cpu cpu)
>> {
>> int value, ret = cpu__get_topology_int(cpu.cpu, "core_id", &value);
>> @@ -320,8 +345,8 @@ struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data)
>> struct aggr_cpu_id id;
>> int core = cpu__get_core_id(cpu);
>>
>> - /* aggr_cpu_id__die returns a struct with socket and die set. */
>> - id = aggr_cpu_id__die(cpu, data);
>> + /* aggr_cpu_id__die returns a struct with socket die, and cluster set. */
>> + id = aggr_cpu_id__cluster(cpu, data);
>> if (aggr_cpu_id__is_empty(&id))
>> return id;
>>
>> @@ -683,6 +708,7 @@ bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b
>> a->node == b->node &&
>> a->socket == b->socket &&
>> a->die == b->die &&
>> + a->cluster == b->cluster &&
>> a->cache_lvl == b->cache_lvl &&
>> a->cache == b->cache &&
>> a->core == b->core &&
>> @@ -695,6 +721,7 @@ bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a)
>> a->node == -1 &&
>> a->socket == -1 &&
>> a->die == -1 &&
>> + a->cluster == -1 &&
>> a->cache_lvl == -1 &&
>> a->cache == -1 &&
>> a->core == -1 &&
>> @@ -708,6 +735,7 @@ struct aggr_cpu_id aggr_cpu_id__empty(void)
>> .node = -1,
>> .socket = -1,
>> .die = -1,
>> + .cluster = -1,
>> .cache_lvl = -1,
>> .cache = -1,
>> .core = -1,
>> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
>> index 9df2aeb34d3d..26cf76c693f5 100644
>> --- a/tools/perf/util/cpumap.h
>> +++ b/tools/perf/util/cpumap.h
>> @@ -20,6 +20,8 @@ struct aggr_cpu_id {
>> int socket;
>> /** The die id as read from /sys/devices/system/cpu/cpuX/topology/die_id. */
>> int die;
>> + /** The cluster id as read from /sys/devices/system/cpu/cpuX/topology/cluster_id */
>> + int cluster;
>> /** The cache level as read from /sys/devices/system/cpu/cpuX/cache/indexY/level */
>> int cache_lvl;
>> /**
>> @@ -86,6 +88,11 @@ int cpu__get_socket_id(struct perf_cpu cpu);
>> * /sys/devices/system/cpu/cpuX/topology/die_id for the given CPU.
>> */
>> int cpu__get_die_id(struct perf_cpu cpu);
>> +/**
>> + * cpu__get_cluster_id - Returns the cluster id as read from
>> + * /sys/devices/system/cpu/cpuX/topology/cluster_id for the given CPU
>> + */
>> +int cpu__get_cluster_id(struct perf_cpu cpu);
>> /**
>> * cpu__get_core_id - Returns the core id as read from
>> * /sys/devices/system/cpu/cpuX/topology/core_id for the given CPU.
>> @@ -127,9 +134,15 @@ struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data);
>> */
>> struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data);
>> /**
>> - * aggr_cpu_id__core - Create an aggr_cpu_id with the core, die and socket
>> - * populated with the core, die and socket for cpu. The function signature is
>> - * compatible with aggr_cpu_id_get_t.
>> + * aggr_cpu_id__cluster - Create an aggr_cpu_id with cluster, die and socket
>> + * populated with the cluster, die and socket for cpu. The function signature
>> + * is compatible with aggr_cpu_id_get_t.
>> + */
>> +struct aggr_cpu_id aggr_cpu_id__cluster(struct perf_cpu cpu, void *data);
>> +/**
>> + * aggr_cpu_id__core - Create an aggr_cpu_id with the core, cluster, die and
>> + * socket populated with the core, die and socket for cpu. The function
>> + * signature is compatible with aggr_cpu_id_get_t.
>> */
>> struct aggr_cpu_id aggr_cpu_id__core(struct perf_cpu cpu, void *data);
>> /**
>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>> index 7c527e65c186..2a2c37cc40b7 100644
>> --- a/tools/perf/util/env.h
>> +++ b/tools/perf/util/env.h
>> @@ -12,6 +12,7 @@ struct perf_cpu_map;
>> struct cpu_topology_map {
>> int socket_id;
>> int die_id;
>> + int cluster_id;
>> int core_id;
>> };
>>
>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>> index 8c61f8627ebc..4dfe7d9517a9 100644
>> --- a/tools/perf/util/stat-display.c
>> +++ b/tools/perf/util/stat-display.c
>> @@ -201,6 +201,9 @@ static void print_aggr_id_std(struct perf_stat_config *config,
>> snprintf(buf, sizeof(buf), "S%d-D%d-L%d-ID%d",
>> id.socket, id.die, id.cache_lvl, id.cache);
>> break;
>> + case AGGR_CLUSTER:
>> + snprintf(buf, sizeof(buf), "S%d-D%d-CLS%d", id.socket, id.die, id.cluster);
>> + break;
>> case AGGR_DIE:
>> snprintf(buf, sizeof(buf), "S%d-D%d", id.socket, id.die);
>> break;
>> @@ -251,6 +254,10 @@ static void print_aggr_id_csv(struct perf_stat_config *config,
>> fprintf(config->output, "S%d-D%d-L%d-ID%d%s%d%s",
>> id.socket, id.die, id.cache_lvl, id.cache, sep, aggr_nr, sep);
>> break;
>> + case AGGR_CLUSTER:
>> + fprintf(config->output, "S%d-D%d-CLS%d%s%d%s",
>> + id.socket, id.die, id.cluster, sep, aggr_nr, sep);
>> + break;
>> case AGGR_DIE:
>> fprintf(output, "S%d-D%d%s%d%s",
>> id.socket, id.die, sep, aggr_nr, sep);
>> @@ -300,6 +307,10 @@ static void print_aggr_id_json(struct perf_stat_config *config,
>> fprintf(output, "\"cache\" : \"S%d-D%d-L%d-ID%d\", \"aggregate-number\" : %d, ",
>> id.socket, id.die, id.cache_lvl, id.cache, aggr_nr);
>> break;
>> + case AGGR_CLUSTER:
>> + fprintf(output, "\"cluster\" : \"S%d-D%d-CLS%d\", \"aggregate-number\" : %d, ",
>> + id.socket, id.die, id.cluster, aggr_nr);
>> + break;
>> case AGGR_DIE:
>> fprintf(output, "\"die\" : \"S%d-D%d\", \"aggregate-number\" : %d, ",
>> id.socket, id.die, aggr_nr);
>> @@ -1248,6 +1259,7 @@ static void print_header_interval_std(struct perf_stat_config *config,
>> case AGGR_NODE:
>> case AGGR_SOCKET:
>> case AGGR_DIE:
>> + case AGGR_CLUSTER:
>> case AGGR_CACHE:
>> case AGGR_CORE:
>> fprintf(output, "#%*s %-*s cpus",
>> @@ -1550,6 +1562,7 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
>> switch (config->aggr_mode) {
>> case AGGR_CORE:
>> case AGGR_CACHE:
>> + case AGGR_CLUSTER:
>> case AGGR_DIE:
>> case AGGR_SOCKET:
>> case AGGR_NODE:
>> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
>> index 4357ba114822..d6e5c8787ba2 100644
>> --- a/tools/perf/util/stat.h
>> +++ b/tools/perf/util/stat.h
>> @@ -48,6 +48,7 @@ enum aggr_mode {
>> AGGR_GLOBAL,
>> AGGR_SOCKET,
>> AGGR_DIE,
>> + AGGR_CLUSTER,
>> AGGR_CACHE,
>> AGGR_CORE,
>> AGGR_THREAD,
>
> .
>