Re: [PATCH v3 1/5] perf cpumap: If the die_id is missing use socket/physical_package_id

From: James Clark
Date: Wed Dec 18 2024 - 07:04:23 EST




On 16/12/2024 11:24 pm, Ian Rogers wrote:
An error value for a missing die_id may be written into things like
the cpu_topology_map. As the topology needs to be fully written out,
including the die_id, to allow perf.data file features to be aligned
we can't allow error values to be written out. Instead base the
missing die_id value off of the socket/physical_package_id assuming
they correlate 1:1.

Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
---
tools/perf/util/cpumap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 27094211edd8..d362272f8466 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -283,7 +283,8 @@ int cpu__get_die_id(struct perf_cpu cpu)
{
int value, ret = cpu__get_topology_int(cpu.cpu, "die_id", &value);
- return ret ?: value;
+ /* If die_id is missing fallback on using the socket/physical_package_id. */
+ return ret || value < 0 ? cpu__get_socket_id(cpu) : value;
}
struct aggr_cpu_id aggr_cpu_id__die(struct perf_cpu cpu, void *data)

Hi Ian,

I sent a fix for the same or a similar problem here [1]. For this one I'm not sure why we'd want to use the socket ID for die when it's always been 0 for not present. I wonder if this change is mingling two things: fixing the negative error value appearing and replacing die with socket ID.

Personally I would prefer to keep the 0 to fix the error value, that way nobody gets surprised by the change.

Also it looks like cpu__get_cluster_id() can suffer from the same issue, and if we do it this way we should drop these as they aren't valid anymore:

/* There is no die_id on legacy system. */
if (die < 0)
die = 0;

And last minor comment this could do with a fixes: 05be17eed774

[1]: https://lore.kernel.org/linux-perf-users/20241218115552.912517-1-james.clark@xxxxxxxxxx/T/#u