[RFC PATCH v2 1/2] perf/x86/rapl: Fix energy-cores event

From: Wyes Karny
Date: Wed Mar 01 2023 - 13:16:14 EST


For quite some time, energy-cores event is broken, because RAPL PMU
assumes all the events on this PMU are uncore and sets rapl_cpu_mask
with the first available CPU on the die. Therefore, for energy-cores
event if we read MSR form pmu->cpu, it's wrong. But the following two
changes helped to hide this issue.

- commit 704e2f5b700d ("perf stat: Use affinity for enabling/disabling
events")
- commit e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore
events")

These two changes together acted as a workaround for energy-cores event.
First change affined perf events to respective CPUs whereas the second
change helped to pick the local CPU to read the MSR. In this way, MSRs
were read from the correct CPU. This works unless it's the first
reading. For the first reading the second patch doesn't apply and we
get wrong readings. Stephane reported this issue when a patch to enable
AMD energy-cores RAPL event was posted [1].

The right way to fix the issue is to get rid of RAPL being considered an
uncore event. That is a larger change. To enable current RAPL usage,
work around the issue by conditionally remove the
`PERF_EV_CAP_READ_ACTIVE_PKG` flag for energy-cores event. Also, use the
event's CPU instead for PMU's CPU to read the MSR.

[1]: https://lore.kernel.org/lkml/CABPqkBQ_bSTC-OEe_LrgUrpj2VsseX1ThvO-yLcEtF8vb4+AAw@xxxxxxxxxxxxxx/#t

Fixes: e64cd6f73ff5 ("perf/x86: Use PMUEF_READ_CPU_PKG in uncore events")
Signed-off-by: Wyes Karny <wyes.karny@xxxxxxx>
---
arch/x86/events/rapl.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 52e6e7ed4f78..e6a0c077daf5 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -343,14 +343,15 @@ static int rapl_pmu_event_init(struct perf_event *event)
if (event->cpu < 0)
return -EINVAL;

- event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
-
if (!cfg || cfg >= NR_RAPL_DOMAINS + 1)
return -EINVAL;

cfg = array_index_nospec((long)cfg, NR_RAPL_DOMAINS + 1);
bit = cfg - 1;

+ if (bit != PERF_RAPL_PP0)
+ event->event_caps |= PERF_EV_CAP_READ_ACTIVE_PKG;
+
/* check event supported */
if (!(rapl_cntr_mask & (1 << bit)))
return -EINVAL;
@@ -363,7 +364,15 @@ static int rapl_pmu_event_init(struct perf_event *event)
pmu = cpu_to_rapl_pmu(event->cpu);
if (!pmu)
return -EINVAL;
- event->cpu = pmu->cpu;
+
+ /*
+ * FIXME: RAPL PMU considers events are uncore and MSRs can be read from
+ * the first available CPU of the die. But this is not true for energy-cores
+ * event. Therefore as a workaround don't consider pmu->cpu here for PERF_RAPL_PP0.
+ */
+ if (event->event_caps & PERF_EV_CAP_READ_ACTIVE_PKG)
+ event->cpu = pmu->cpu;
+
event->pmu_private = pmu;
event->hw.event_base = rapl_msrs[bit].msr;
event->hw.config = cfg;
--
2.34.1