Re: [PATCH v1] perf/x86: Fix potential bad container_of in intel_pmu_hw_config
From: Falcon, Thomas
Date: Fri Mar 13 2026 - 17:58:46 EST
On Thu, 2026-03-12 at 12:44 -0700, Ian Rogers wrote:
> On Thu, Mar 12, 2026 at 12:43 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > Auto counter reload may have a group of events with software events
> > present within it. The software event PMU isn't the x86_hybrid_pmu and
> > a container_of operation in intel_pmu_set_acr_caused_constr (via the
> > hybrid helper) could cause out of bound memory reads. Avoid this by
> > guarding the call to intel_pmu_set_acr_caused_constr with an
> > is_x86_event check.
> >
> > Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> +Thomas Falcon
>
> Thanks,
> Ian
>
> > ---
> > This fix was prompted by failure to get a BUG_ON in this series:
> > https://lore.kernel.org/lkml/a61eae6d-7a6d-40bd-83ec-bd4ea7657b9d@xxxxxxxxxxxxxxx/
> > and so I ran an AI analysis to see if there were similar bad casts as
> > spotted in:
> > https://lore.kernel.org/lkml/20260311075201.2951073-2-dapeng1.mi@xxxxxxxxxxxxxxx/
> > The AI analysis found this issue and its much more verbose
> > description is below:
> >
> > I have evaluated all callers of the hybrid_pmu function within the
> > arch/x86/events directory. The vast majority of usages are safe
> > because they operate on an event that is currently being initialized
> > by the x86 PMU subsystem, which guarantees that event->pmu is
> > inherently an x86 PMU.
> >
> > However, there is a distinct bug where a non-x86 PMU (e.g., a software
> > PMU) can be inadvertently passed into the hybrid_pmu function.
> >
> > This issue occurs during group event validation and configuration in
> > intel_pmu_hw_config.
> >
> > The Vulnerability Flow
> >
> > 1. The Context:
> >
> > Inside intel_pmu_hw_config (located in arch/x86/events/intel/core.c),
> > there is logic to handle Automatic Counter Reload (ACR) capabilities
> > for event groups. The code needs to identify siblings that cause other
> > events to reload and apply constraints to them.
> >
> > 2. The Missing Check:
> >
> > In the first pass through the sibling events, the code correctly
> > checks if each sibling is an x86 event via !is_x86_event(sibling)
> > before building a cause_mask. However, in the second pass to apply
> > the constraints, the code iterates over all siblings again but omits
> > the is_x86_event(sibling) check: arch/x86/events/intel/core.c#n4847
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4847)
> >
> > 1 if (leader->nr_siblings) {
> > 2 for_each_sibling_event(sibling, leader)
> > 3 intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask); // <-- Missing is_x86_event() check!
> > 4 }
> >
Hi Ian, thanks for cc'ing me. The fix looks good to me. I was confused initially because first pass described above looks like it is checking for non-x86 events and bailing, but really is only checking for these events in between ACR events because "ACR events must be contiguous." Software events inserted outside of these events should be valid though, so the is_x86_event() check in the second pass is needed.
Thanks,
Reviewed-by: Thomas Falcon <thomas.falcon@xxxxxxxxx>
> > 3. The Invalid Cast:
> >
> > The intel_pmu_set_acr_caused_constr function takes this sibling event
> > (which could be a software event) and executes the hybrid macro over
> > its pmu pointer: arch/x86/events/intel/core.c#n4624
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4624)
> >
> > 1 static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> > 2 int idx, u64 cause_mask)
> > 3 {
> > 4 if (test_bit(idx, (unsigned long *)&cause_mask))
> > 5 event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> > 6 }
> >
> > 4. The Root Cause:
> >
> > The hybrid macro expands and passes the event->pmu to hybrid_pmu:
> > arch/x86/events/perf_event.h#n788
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n788)
> >
> > 1 #define hybrid(_pmu, _field) \
> > 2 ...
> > 3 if (is_hybrid() && (_pmu)) \
> > 4 __Fp = &hybrid_pmu(_pmu)->_field; \
> >
> > Which subsequently results in a blind container_of on a non-x86 PMU
> > pointer: arch/x86/events/perf_event.h#n780
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n780)
> >
> > 1 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
> > 2 {
> > 3 return container_of(pmu, struct x86_hybrid_pmu, pmu);
> > 4 }
> >
> > Conclusion
> >
> > If a user creates an event group led by an x86 ACR event but includes
> > a non-x86 sibling event (like a software event), the second traversal
> > in intel_pmu_hw_config will blindly pass the software PMU to
> > hybrid_pmu. Because container_of assumes the PMU is embedded inside an
> > x86_hybrid_pmu struct, the resulting pointer becomes invalid, leading
> > to memory corruption or an out-of-bounds access when attempting to
> > read the acr_cause_mask64 property.
> > ---
> > arch/x86/events/intel/core.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index cf3a4fe06ff2..26e829b8a882 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -4844,8 +4844,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
> > intel_pmu_set_acr_caused_constr(leader, idx++, cause_mask);
> >
> > if (leader->nr_siblings) {
> > - for_each_sibling_event(sibling, leader)
> > - intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);
> > + for_each_sibling_event(sibling, leader) {
> > + if (is_x86_event(sibling))
> > + intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);
> > + }
> > }
> >
> > if (leader != event)
> > --
> > 2.53.0.851.ga537e3e6e9-goog
> >