[PATCH V3] perf/x86/rapl: fix deadlock in rapl_pmu_event_stop
From: Duoming Zhou
Date: Wed Sep 21 2022 - 08:14:06 EST
There is a deadlock in rapl_pmu_event_stop(), the process is
shown below:
(thread 1) | (thread 2)
rapl_pmu_event_stop() | rapl_hrtimer_handle()
... | if (!pmu->n_active)
raw_spin_lock_irqsave() //(1) | ...
... |
hrtimer_cancel() | raw_spin_lock_irqsave() //(2)
(block forever)
We hold pmu->lock in position (1) and use hrtimer_cancel() to wait
rapl_hrtimer_handle() to stop, but rapl_hrtimer_handle() also need
pmu->lock in position (2). As a result, the rapl_pmu_event_stop()
will be blocked forever.
This patch uses hrtimer_try_to_cancel() to replace hrtimer_cancel()
and moves the check "if (!pmu->n_active)" into the protection scope
of pmu->lock. If the timer callback function is running, the
hrtimer_try_to_cancel() will directly return. After the
rapl_pmu_event_stop() has finished, the "pmu->n_active" equals to 0
and the rapl_hrtimer_handle() will return "HRTIMER_NORESTART".
Fixes: 65661f96d3b3 ("perf/x86: Add RAPL hrtimer support")
Signed-off-by: Duoming Zhou <duoming@xxxxxxxxxx>
---
Changes in V3:
- Use hrtimer_try_to_cancel() to replace hrtimer_cancel().
- Use pmu->lock to protect the check "if (!pmu->n_active)".
arch/x86/events/rapl.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 77e3a47af5a..a526a08ee6e 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -219,11 +219,13 @@ static enum hrtimer_restart rapl_hrtimer_handle(struct hrtimer *hrtimer)
struct perf_event *event;
unsigned long flags;
- if (!pmu->n_active)
- return HRTIMER_NORESTART;
-
raw_spin_lock_irqsave(&pmu->lock, flags);
+ if (!pmu->n_active) {
+ raw_spin_unlock_irqrestore(&pmu->lock, flags);
+ return HRTIMER_NORESTART;
+ }
+
list_for_each_entry(event, &pmu->active_list, active_entry)
rapl_event_update(event);
@@ -282,7 +284,7 @@ static void rapl_pmu_event_stop(struct perf_event *event, int mode)
WARN_ON_ONCE(pmu->n_active <= 0);
pmu->n_active--;
if (pmu->n_active == 0)
- hrtimer_cancel(&pmu->hrtimer);
+ hrtimer_try_to_cancel(&pmu->hrtimer);
list_del(&event->active_entry);
--
2.17.1