[PATCH V2] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop

From: Duoming Zhou
Date: Mon Sep 19 2022 - 21:45:23 EST


There is a deadlock in rapl_pmu_event_stop(), the process is
shown below:

(thread 1) | (thread 2)
rapl_pmu_event_stop() | rapl_hrtimer_handle()
... | if (!pmu->n_active)
raw_spin_lock_irqsave() //(1) | ...
... |
hrtimer_cancel() | raw_spin_lock_irqsave() //(2)
(block forever)

We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
will be blocked forever.

This patch extracts hrtimer_cancel() from the protection of
raw_spin_lock_irqsave(). As a result, the rapl_hrtimer_handle()
could obtain the pmu->lock.

Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
---
Changes in v2:
- Move hrtimer_cancel() to the end of rapl_pmu_event_stop() function.

arch/x86/events/rapl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 77e3a47af5a..7c110092c83 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -281,8 +281,6 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
if (!(hwc->state & PERF_HES_STOPPED)) {
WARN_ON_ONCE(pmu->n_active <= 0);
pmu->n_active--;
- if (pmu->n_active == 0)
- hrtimer_cancel(&pmu->hrtimer);

list_del(&event->active_entry);

@@ -300,6 +298,11 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
hwc->state |= PERF_HES_UPTODATE;
}

+ if (!pmu->n_active) {
+ raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ hrtimer_cancel(&pmu->hrtimer);
+ return;
+ }
raw_spin_unlock_irqrestore(&pmu->lock, flags);
}

--
2.17.1