Re: [PATCH v2] perf/x86: Move event pointer setup earlier in x86_pmu_enable()
From: Peter Zijlstra
Date: Wed Mar 11 2026 - 13:37:03 EST
On Wed, Mar 11, 2026 at 09:37:08AM -0700, Ian Rogers wrote:
> Fwiw, an AI review was saying something similar to me.
How do you guys get this AI stuff to say anything sensible at all about
the code? Every time I try, it is telling me the most abject nonsense.
I've wasted hours of my life trying to guide it to known bugs, in vain.
> I wonder if the
> issue with NMI storms exists already via another path:
>
> By populating cpuc->events[idx] here, even for events that skip
> x86_pmu_start() due to the PERF_HES_ARCH check, can this lead to PEBS
> data corruption?
>
> For Intel PEBS, intel_pmu_pebs_late_setup() iterates over cpuc->event_list
> and enables PEBS_DATA_CFG for this counter index regardless of its stopped
> state. If the PMU hardware generates PEBS records for this uninitialized
> counter, or if there are leftover PEBS records from the counter's previous
> occupant (since x86_pmu_stop() does not drain the PEBS buffer),
> intel_perf_event_update_pmc() will now find a non-NULL event pointer.
> Will it incorrectly process these leftover records and attribute them
> to the stopped event?
Yes.
> Additionally, does this change leave the unthrottled event's hardware
> counter uninitialized?
Also yes.
> When x86_pmu_enable() moves a throttled event to a new counter, it sets
> PERF_HES_ARCH and skips calling x86_pmu_start(event, PERF_EF_RELOAD).
> Later, when the timer tick unthrottles the event, it calls
> perf_event_unthrottle(), which invokes event->pmu->start(event, 0).
> In x86_pmu_start(), because flags == 0 (lacking PERF_EF_RELOAD),
> x86_pmu_set_period() is skipped. Will the newly assigned hardware counter
> be enabled without being programmed with the event's period, potentially
> causing it to start from a garbage value and leading to incorrect sampling
> intervals or NMI storms?
Don't think you can get NMI storms this way. If the PMI trips, we'll do
set_period and that should fix it up.
So just wrong counting.