Re: [PATCH v1] perf/x86: Fix potential bad container_of in intel_pmu_hw_config

From: Ian Rogers

Date: Thu Mar 12 2026 - 15:48:06 EST


On Thu, Mar 12, 2026 at 12:43 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> Auto counter reload may have a group of events with software events
> present within it. The software event PMU isn't the x86_hybrid_pmu and
> a container_of operation in intel_pmu_set_acr_caused_constr (via the
> hybrid helper) could cause out of bound memory reads. Avoid this by
> guarding the call to intel_pmu_set_acr_caused_constr with an
> is_x86_event check.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>

+Thomas Falcon

Thanks,
Ian

> ---
> This fix was prompted by failure to get a BUG_ON in this series:
> https://lore.kernel.org/lkml/a61eae6d-7a6d-40bd-83ec-bd4ea7657b9d@xxxxxxxxxxxxxxx/
> and so I ran an AI analysis to see if there were similar bad casts as
> spotted in:
> https://lore.kernel.org/lkml/20260311075201.2951073-2-dapeng1.mi@xxxxxxxxxxxxxxx/
> The AI analysis found this issue and its much more verbose
> description is below:
>
> I have evaluated all callers of the hybrid_pmu function within the
> arch/x86/events directory. The vast majority of usages are safe
> because they operate on an event that is currently being initialized
> by the x86 PMU subsystem, which guarantees that event->pmu is
> inherently an x86 PMU.
>
> However, there is a distinct bug where a non-x86 PMU (e.g., a software
> PMU) can be inadvertently passed into the hybrid_pmu function.
>
> This issue occurs during group event validation and configuration in
> intel_pmu_hw_config.
>
> The Vulnerability Flow
>
> 1. The Context:
>
> Inside intel_pmu_hw_config (located in arch/x86/events/intel/core.c),
> there is logic to handle Automatic Counter Reload (ACR) capabilities
> for event groups. The code needs to identify siblings that cause other
> events to reload and apply constraints to them.
>
> 2. The Missing Check:
>
> In the first pass through the sibling events, the code correctly
> checks if each sibling is an x86 event via !is_x86_event(sibling)
> before building a cause_mask. However, in the second pass to apply
> the constraints, the code iterates over all siblings again but omits
> the is_x86_event(sibling) check: arch/x86/events/intel/core.c#n4847
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4847)
>
> 1 if (leader->nr_siblings) {
> 2 for_each_sibling_event(sibling, leader)
> 3 intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask); // <-- Missing is_x86_event() check!
> 4 }
>
> 3. The Invalid Cast:
>
> The intel_pmu_set_acr_caused_constr function takes this sibling event
> (which could be a software event) and executes the hybrid macro over
> its pmu pointer: arch/x86/events/intel/core.c#n4624
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4624)
>
> 1 static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> 2 int idx, u64 cause_mask)
> 3 {
> 4 if (test_bit(idx, (unsigned long *)&cause_mask))
> 5 event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> 6 }
>
> 4. The Root Cause:
>
> The hybrid macro expands and passes the event->pmu to hybrid_pmu:
> arch/x86/events/perf_event.h#n788
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n788)
>
> 1 #define hybrid(_pmu, _field) \
> 2 ...
> 3 if (is_hybrid() && (_pmu)) \
> 4 __Fp = &hybrid_pmu(_pmu)->_field; \
>
> Which subsequently results in a blind container_of on a non-x86 PMU
> pointer: arch/x86/events/perf_event.h#n780
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n780)
>
> 1 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
> 2 {
> 3 return container_of(pmu, struct x86_hybrid_pmu, pmu);
> 4 }
>
> Conclusion
>
> If a user creates an event group led by an x86 ACR event but includes
> a non-x86 sibling event (like a software event), the second traversal
> in intel_pmu_hw_config will blindly pass the software PMU to
> hybrid_pmu. Because container_of assumes the PMU is embedded inside an
> x86_hybrid_pmu struct, the resulting pointer becomes invalid, leading
> to memory corruption or an out-of-bounds access when attempting to
> read the acr_cause_mask64 property.
> ---
> arch/x86/events/intel/core.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cf3a4fe06ff2..26e829b8a882 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4844,8 +4844,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
> intel_pmu_set_acr_caused_constr(leader, idx++, cause_mask);
>
> if (leader->nr_siblings) {
> - for_each_sibling_event(sibling, leader)
> - intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);
> + for_each_sibling_event(sibling, leader) {
> + if (is_x86_event(sibling))
> + intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);
> + }
> }
>
> if (leader != event)
> --
> 2.53.0.851.ga537e3e6e9-goog
>