Re: [PATCH v4 2/5] perf stat: Setup the foundation to allow aggregation based on cache topology

From: K Prateek Nayak
Date: Tue May 23 2023 - 23:01:22 EST

Next message: Chao Yu: "Re: [PATCH v22 19/31] f2fs: Provide a splice-read wrapper"
Previous message: Huacai Chen: "Re: [PATCH v14 1/2] drm: add kms driver for loongson display controller"
In reply to: Arnaldo Carvalho de Melo: "Re: [PATCH v4 2/5] perf stat: Setup the foundation to allow aggregation based on cache topology"
Next in thread: K Prateek Nayak: "[PATCH v4 3/5] perf stat: Save cache level information when running perf stat record"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Arnaldo,

On 5/24/2023 12:42 AM, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 17, 2023 at 10:57:42PM +0530, K Prateek Nayak escreveu:
>> Processors based on chiplet architecture, such as AMD EPYC and Hygon do
>> not expose the chiplet details in the sysfs CPU topology information.
>> However, this information can be derived from the per CPU cache level
>> information from the sysfs.
>>
>> perf stat has already supported aggregation based on topology
>> information using core ID, socket ID, etc. It'll be useful to aggregate
>> based on the cache topology to detect problems like imbalance and
>> cache-to-cache sharing at various cache levels.
>>
>> This patch lays the foundation for aggregating data in perf stat based
>> on the processor's cache topology. The cmdline option to aggregate data
>> based on the cache topology is added in Patch 4 of the series while this
>> patch sets up all the necessary functions and variables required to
>> support the new aggregation option.
>>
>> The patch also adds support to display per-cache aggregation, or save it
>> as a JSON or CSV, as splitting it into a separate patch would break
>> builds when compiling with "-Werror=switch-enum" where the compiler will
>> complain about the lack of handling for the AGGR_CACHE case in the
>> output functions.
>>
>> Suggested-by: Gautham R. Shenoy <gautham.shenoy@xxxxxxx>
>> Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
>> ---
>> Changelog:
>> o v3->v4:
>> - Some parts of the previous Patch 2 have been put into subsequent
>> smaller patches (while being careful not to introduce any build
>> errors in case someone were to bisect through the series)
>> - Fixed comments.
>
> So I had to make the following changes, added this explanation to the
> resulting cset:
>
> Committer notes:
>
> Don't use perf_stat_config in tools/perf/util/cpumap.c, this would make
> code that is in util/, thus not really specific to a single builtin, use
> a specific builtin config structure.
>
> Move the functions introduced in this patch from
> tools/perf/util/cpumap.c since it needs access to builtin specific
> and is not strictly needed to live in the util/ directory.
>
> With this 'perf test python' is back building.
>
> - Arnaldo

An oversight on my part. Sorry about that. Thank you for fixing this and
picking up the changes :)

>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 68294ea499ae51d9..0528d1bc15d27705 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -150,7 +150,7 @@ static struct perf_stat perf_stat;
>
> static volatile sig_atomic_t done = 0;
>
> -struct perf_stat_config stat_config = {
> +static struct perf_stat_config stat_config = {
> .aggr_mode = AGGR_GLOBAL,
> .aggr_level = MAX_CACHE_LVL + 1,
> .scale = true,
> @@ -1251,6 +1251,129 @@ static struct option stat_options[] = {
> OPT_END()
> };
>
> +/**
> + * Calculate the cache instance ID from the map in
> + * /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_list
> + * Cache instance ID is the first CPU reported in the shared_cpu_list file.
> + */
> +static int cpu__get_cache_id_from_map(struct perf_cpu cpu, char *map)
> +{
> + int id;
> + struct perf_cpu_map *cpu_map = perf_cpu_map__new(map);
> +
> + /*
> + * If the map contains no CPU, consider the current CPU to
> + * be the first online CPU in the cache domain else use the
> + * first online CPU of the cache domain as the ID.
> + */
> + if (perf_cpu_map__empty(cpu_map))
> + id = cpu.cpu;
> + else
> + id = perf_cpu_map__cpu(cpu_map, 0).cpu;
> +
> + /* Free the perf_cpu_map used to find the cache ID */
> + perf_cpu_map__put(cpu_map);
> +
> + return id;
> +}
> +
> +/**
> + * cpu__get_cache_id - Returns 0 if successful in populating the
> + * cache level and cache id. Cache level is read from
> + * /sys/devices/system/cpu/cpuX/cache/indexY/level where as cache instance ID
> + * is the first CPU reported by
> + * /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_list
> + */
> +static int cpu__get_cache_details(struct perf_cpu cpu, struct perf_cache *cache)
> +{
> + int ret = 0;
> + u32 cache_level = stat_config.aggr_level;
> + struct cpu_cache_level caches[MAX_CACHE_LVL];
> + u32 i = 0, caches_cnt = 0;
> +
> + cache->cache_lvl = (cache_level > MAX_CACHE_LVL) ? 0 : cache_level;
> + cache->cache = -1;
> +
> + ret = build_caches_for_cpu(cpu.cpu, caches, &caches_cnt);
> + if (ret) {
> + /*
> + * If caches_cnt is not 0, cpu_cache_level data
> + * was allocated when building the topology.
> + * Free the allocated data before returning.
> + */
> + if (caches_cnt)
> + goto free_caches;
> +
> + return ret;
> + }
> +
> + if (!caches_cnt)
> + return -1;
> +
> + /*
> + * Save the data for the highest level if no
> + * level was specified by the user.
> + */
> + if (cache_level > MAX_CACHE_LVL) {
> + int max_level_index = 0;
> +
> + for (i = 1; i < caches_cnt; ++i) {
> + if (caches[i].level > caches[max_level_index].level)
> + max_level_index = i;
> + }
> +
> + cache->cache_lvl = caches[max_level_index].level;
> + cache->cache = cpu__get_cache_id_from_map(cpu, caches[max_level_index].map);
> +
> + /* Reset i to 0 to free entire caches[] */
> + i = 0;
> + goto free_caches;
> + }
> +
> + for (i = 0; i < caches_cnt; ++i) {
> + if (caches[i].level == cache_level) {
> + cache->cache_lvl = cache_level;
> + cache->cache = cpu__get_cache_id_from_map(cpu, caches[i].map);
> + }
> +
> + cpu_cache_level__free(&caches[i]);
> + }
> +
> +free_caches:
> + /*
> + * Free all the allocated cpu_cache_level data.
> + */
> + while (i < caches_cnt)
> + cpu_cache_level__free(&caches[i++]);
> +
> + return ret;
> +}
> +
> +/**
> + * aggr_cpu_id__cache - Create an aggr_cpu_id with cache instache ID, cache
> + * level, die and socket populated with the cache instache ID, cache level,
> + * die and socket for cpu. The function signature is compatible with
> + * aggr_cpu_id_get_t.
> + */
> +static struct aggr_cpu_id aggr_cpu_id__cache(struct perf_cpu cpu, void *data)
> +{
> + int ret;
> + struct aggr_cpu_id id;
> + struct perf_cache cache;
> +
> + id = aggr_cpu_id__die(cpu, data);
> + if (aggr_cpu_id__is_empty(&id))
> + return id;
> +
> + ret = cpu__get_cache_details(cpu, &cache);
> + if (ret)
> + return id;
> +
> + id.cache_lvl = cache.cache_lvl;
> + id.cache = cache.cache;
> + return id;
> +}
> +
> static const char *const aggr_mode__string[] = {
> [AGGR_CORE] = "core",
> [AGGR_CACHE] = "cache",
> diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> index 88d387200745de2f..a0719816a218d441 100644
> --- a/tools/perf/util/cpumap.c
> +++ b/tools/perf/util/cpumap.c
> @@ -3,8 +3,6 @@
> #include "cpumap.h"
> #include "debug.h"
> #include "event.h"
> -#include "header.h"
> -#include "stat.h"
> #include <assert.h>
> #include <dirent.h>
> #include <stdio.h>
> @@ -311,113 +309,6 @@ struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data)
> return id;
> }
>
> -extern struct perf_stat_config stat_config;
> -
> -int cpu__get_cache_id_from_map(struct perf_cpu cpu, char *map)
> -{
> - int id;
> - struct perf_cpu_map *cpu_map = perf_cpu_map__new(map);
> -
> - /*
> - * If the map contains no CPU, consider the current CPU to
> - * be the first online CPU in the cache domain else use the
> - * first online CPU of the cache domain as the ID.
> - */
> - if (perf_cpu_map__empty(cpu_map))
> - id = cpu.cpu;
> - else
> - id = perf_cpu_map__cpu(cpu_map, 0).cpu;
> -
> - /* Free the perf_cpu_map used to find the cache ID */
> - perf_cpu_map__put(cpu_map);
> -
> - return id;
> -}
> -
> -int cpu__get_cache_details(struct perf_cpu cpu, struct perf_cache *cache)
> -{
> - int ret = 0;
> - struct cpu_cache_level caches[MAX_CACHE_LVL];
> - u32 cache_level = stat_config.aggr_level;
> - u32 i = 0, caches_cnt = 0;
> -
> - cache->cache_lvl = (cache_level > MAX_CACHE_LVL) ? 0 : cache_level;
> - cache->cache = -1;
> -
> - ret = build_caches_for_cpu(cpu.cpu, caches, &caches_cnt);
> - if (ret) {
> - /*
> - * If caches_cnt is not 0, cpu_cache_level data
> - * was allocated when building the topology.
> - * Free the allocated data before returning.
> - */
> - if (caches_cnt)
> - goto free_caches;
> -
> - return ret;
> - }
> -
> - if (!caches_cnt)
> - return -1;
> -
> - /*
> - * Save the data for the highest level if no
> - * level was specified by the user.
> - */
> - if (cache_level > MAX_CACHE_LVL) {
> - int max_level_index = 0;
> -
> - for (i = 1; i < caches_cnt; ++i) {
> - if (caches[i].level > caches[max_level_index].level)
> - max_level_index = i;
> - }
> -
> - cache->cache_lvl = caches[max_level_index].level;
> - cache->cache = cpu__get_cache_id_from_map(cpu, caches[max_level_index].map);
> -
> - /* Reset i to 0 to free entire caches[] */
> - i = 0;
> - goto free_caches;
> - }
> -
> - for (i = 0; i < caches_cnt; ++i) {
> - if (caches[i].level == cache_level) {
> - cache->cache_lvl = cache_level;
> - cache->cache = cpu__get_cache_id_from_map(cpu, caches[i].map);
> - }
> -
> - cpu_cache_level__free(&caches[i]);
> - }
> -
> -free_caches:
> - /*
> - * Free all the allocated cpu_cache_level data.
> - */
> - while (i < caches_cnt)
> - cpu_cache_level__free(&caches[i++]);
> -
> - return ret;
> -}
> -
> -struct aggr_cpu_id aggr_cpu_id__cache(struct perf_cpu cpu, void *data)
> -{
> - int ret;
> - struct aggr_cpu_id id;
> - struct perf_cache cache;
> -
> - id = aggr_cpu_id__die(cpu, data);
> - if (aggr_cpu_id__is_empty(&id))
> - return id;
> -
> - ret = cpu__get_cache_details(cpu, &cache);
> - if (ret)
> - return id;
> -
> - id.cache_lvl = cache.cache_lvl;
> - id.cache = cache.cache;
> - return id;
> -}
> -
> int cpu__get_core_id(struct perf_cpu cpu)
> {
> int value, ret = cpu__get_topology_int(cpu.cpu, "core_id", &value);
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index 1212b4ab19386293..f394ccc0ccfbca4c 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -86,20 +86,6 @@ int cpu__get_socket_id(struct perf_cpu cpu);
> * /sys/devices/system/cpu/cpuX/topology/die_id for the given CPU.
> */
> int cpu__get_die_id(struct perf_cpu cpu);
> -/**
> - * Calculate the cache instance ID from the map in
> - * /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_list
> - * Cache instance ID is the first CPU reported in the shared_cpu_list file.
> - */
> -int cpu__get_cache_id_from_map(struct perf_cpu cpu, char *map);
> -/**
> - * cpu__get_cache_id - Returns 0 if successful in populating the
> - * cache level and cache id. Cache level is read from
> - * /sys/devices/system/cpu/cpuX/cache/indexY/level where as cache instance ID
> - * is the first CPU reported by
> - * /sys/devices/system/cpu/cpuX/cache/indexY/shared_cpu_list
> - */
> -int cpu__get_cache_details(struct perf_cpu cpu, struct perf_cache *cache);
> /**
> * cpu__get_core_id - Returns the core id as read from
> * /sys/devices/system/cpu/cpuX/topology/core_id for the given CPU.
> @@ -140,13 +126,6 @@ struct aggr_cpu_id aggr_cpu_id__socket(struct perf_cpu cpu, void *data);
> * aggr_cpu_id_get_t.
> */
> struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data);
> -/**
> - * aggr_cpu_id__cache - Create an aggr_cpu_id with cache instache ID, cache
> - * level, die and socket populated with the cache instache ID, cache level,
> - * die and socket for cpu. The function signature is compatible with
> - * aggr_cpu_id_get_t.
> - */
> -struct aggr_cpu_id aggr_cpu_id__cache(struct perf_cpu cpu, void *data);
> /**
> * aggr_cpu_id__core - Create an aggr_cpu_id with the core, die and socket
> * populated with the core, die and socket for cpu. The function signature is

--
Thanks and Regards,
Prateek

Next message: Chao Yu: "Re: [PATCH v22 19/31] f2fs: Provide a splice-read wrapper"
Previous message: Huacai Chen: "Re: [PATCH v14 1/2] drm: add kms driver for loongson display controller"
In reply to: Arnaldo Carvalho de Melo: "Re: [PATCH v4 2/5] perf stat: Setup the foundation to allow aggregation based on cache topology"
Next in thread: K Prateek Nayak: "[PATCH v4 3/5] perf stat: Save cache level information when running perf stat record"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]