Re: [PATCH] perf/x86/amd/uncore: Add group validation

From: Sandipan Das

Date: Wed Jun 24 2026 - 02:39:52 EST


On 23-06-2026 22:46, Ian Rogers wrote:
> On Tue, Jun 23, 2026 at 3:49 AM Sandipan Das <sandipan.das@xxxxxxx> wrote:
>>
>> The amd_uncore driver currently does not validate event groups and
>> allows creation of groups with more events than the number of available
>> hardware counters. Because of this, pmu->event_init() succeeds but
>> counter assignment fails later in pmu->add() which returns -EBUSY once
>> all counters are exhausted.
>>
>> Address this by introducing group validation in the pmu->event_init()
>> path. Since the uncore PMUs have no per-event constraints and all
>> counters of a PMU are interchangeable, validation is reduced to just
>> counting the group members that target a PMU and ensuring that they fit
>> within the available set of counters.
>>
>> Signed-off-by: Sandipan Das <sandipan.das@xxxxxxx>
>
> This is great Sandipan! Thanks for addressing this! I'd been wondering
> if in the perf tool if we could test hardware PMUs for not supporting
> failing at open properly. This is a problem for weak groups, as used
> by metrics, because they try to group all events and then break the
> group when the open fails. I'd observed that AMD uncore events
> supposedly opened but then failed during reading. I suspect other PMUs
> also suffer this.
>
> Peter mentioned a behavior in the past: opening events in a group in a
> disabled state, with more events than counters, and then the software
> enables and disables events in the group to control counter
> allocation. The perf tool doesn't currently utilize this behavior but
> I think it explains some of the Sashiko feedback.
>

Thanks, I think it does explain the Sashiko feedback.

> Would it be possible to get a Fixes tag for stable backports?
>

Sure. This has always been a gap in the amd_uncore driver ever since
its introduction in commit c43ca5091a37 ("perf/x86/amd: Add support for
AMD NB and L2I "uncore" counters.").