Re: [PATCH] perf/x86: Restore event pointer setup in x86_pmu_start()

From: Peter Zijlstra

Date: Mon Mar 09 2026 - 12:48:54 EST


On Mon, Mar 09, 2026 at 07:40:56AM -0700, Breno Leitao wrote:
> A production AMD EPYC system crashed with a NULL pointer dereference
> in the PMU NMI handler:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000198
> RIP: x86_perf_event_update+0xc/0xa0
> Call Trace:
> <NMI>
> amd_pmu_v2_handle_irq+0x1a6/0x390
> perf_event_nmi_handler+0x24/0x40
>
> The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
> corresponding to the `if (unlikely(!hwc->event_base))` check in
> x86_perf_event_update() where hwc = &event->hw and event is NULL.
>
> drgn inspection of the vmcore on CPU 106 showed a mismatch between
> cpuc->active_mask and cpuc->events[]:
>
> active_mask: 0x1e (bits 1, 2, 3, 4)
> events[1]: 0xff1100136cbd4f38 (valid)
> events[2]: 0x0 (NULL, but active_mask bit 2 set)
> events[3]: 0xff1100076fd2cf38 (valid)
> events[4]: 0xff1100079e990a90 (valid)
>
> The event that should occupy events[2] was found in event_list[2]
> with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
> (which clears hw.state and sets active_mask) but events[2] was
> never populated.
>
> Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
> showing it was stopped when the PMU rescheduled events, confirming the
> throttle-then-reschedule sequence occurred.
>
> The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
> and potential PEBS record loss") which moved the cpuc->events[idx]
> assignment out of x86_pmu_start() and into x86_pmu_enable(). This
> broke any path that calls pmu->start() without going through
> x86_pmu_enable() -- specifically the unthrottle path:
>
> perf_adjust_freq_unthr_events()
> -> perf_event_unthrottle_group()
> -> perf_event_unthrottle()
> -> event->pmu->start(event, 0)
> -> x86_pmu_start() // sets active_mask but not events[]
>
> The race sequence is:
>
> 1. A group of perf events overflows, triggering group throttle via
> perf_event_throttle_group(). All events are stopped: active_mask
> bits cleared, events[] preserved (x86_pmu_stop no longer clears
> events[] after commit 7e772a93eb61).
>
> 2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
> due to other scheduling activity. Stopped events that need to
> move counters get PERF_HES_ARCH set and events[old_idx] cleared.
> In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
> to be skipped -- events[new_idx] is never set.


So why not just move this then? Having less sites that set that value is
more better, no?

---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 03ce1bc7ef2e..54b4c315d927 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1372,6 +1372,8 @@ static void x86_pmu_enable(struct pmu *pmu)
else if (i < n_running)
continue;

+ cpuc->events[hwc->idx] = event;
+
if (hwc->state & PERF_HES_ARCH)
continue;

@@ -1379,7 +1381,6 @@ static void x86_pmu_enable(struct pmu *pmu)
* if cpuc->enabled = 0, then no wrmsr as
* per x86_pmu_enable_event()
*/
- cpuc->events[hwc->idx] = event;
x86_pmu_start(event, PERF_EF_RELOAD);
}
cpuc->n_added = 0;