Re: [PATCH v1] perf/x86: Fix potential bad container_of in intel_pmu_hw_config

From: Ian Rogers

Date: Fri Mar 13 2026 - 12:32:53 EST

On Thu, Mar 12, 2026 at 12:44 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Thu, Mar 12, 2026 at 12:43 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > Auto counter reload may have a group of events with software events
> > present within it. The software event PMU isn't the x86_hybrid_pmu and
> > a container_of operation in intel_pmu_set_acr_caused_constr (via the
> > hybrid helper) could cause out of bound memory reads. Avoid this by
> > guarding the call to intel_pmu_set_acr_caused_constr with an
> > is_x86_event check.
> >
> > Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> +Thomas Falcon
>
> Thanks,
> Ian
>
> > ---
> > This fix was prompted by failure to get a BUG_ON in this series:
> > https://lore.kernel.org/lkml/a61eae6d-7a6d-40bd-83ec-bd4ea7657b9d@xxxxxxxxxxxxxxx/
> > and so I ran an AI analysis to see if there were similar bad casts as
> > spotted in:
> > https://lore.kernel.org/lkml/20260311075201.2951073-2-dapeng1.mi@xxxxxxxxxxxxxxx/
> > The AI analysis found this issue and its much more verbose
> > description is below:
> >
> > I have evaluated all callers of the hybrid_pmu function within the
> > arch/x86/events directory. The vast majority of usages are safe
> > because they operate on an event that is currently being initialized
> > by the x86 PMU subsystem, which guarantees that event->pmu is
> > inherently an x86 PMU.
> >
> > However, there is a distinct bug where a non-x86 PMU (e.g., a software
> > PMU) can be inadvertently passed into the hybrid_pmu function.
> >
> > This issue occurs during group event validation and configuration in
> > intel_pmu_hw_config.
> >
> > The Vulnerability Flow
> >
> > 1. The Context:
> >
> > Inside intel_pmu_hw_config (located in arch/x86/events/intel/core.c),
> > there is logic to handle Automatic Counter Reload (ACR) capabilities
> > for event groups. The code needs to identify siblings that cause other
> > events to reload and apply constraints to them.
> >
> > 2. The Missing Check:
> >
> > In the first pass through the sibling events, the code correctly
> > checks if each sibling is an x86 event via !is_x86_event(sibling)
> > before building a cause_mask. However, in the second pass to apply
> > the constraints, the code iterates over all siblings again but omits
> > the is_x86_event(sibling) check: arch/x86/events/intel/core.c#n4847
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4847)
> >
> > 1 if (leader->nr_siblings) {
> > 2 for_each_sibling_event(sibling, leader)
> > 3 intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask); // <-- Missing is_x86_event() check!
> > 4 }
> >
> > 3. The Invalid Cast:
> >
> > The intel_pmu_set_acr_caused_constr function takes this sibling event
> > (which could be a software event) and executes the hybrid macro over
> > its pmu pointer: arch/x86/events/intel/core.c#n4624
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n4624)
> >
> > 1 static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> > 2 int idx, u64 cause_mask)
> > 3 {
> > 4 if (test_bit(idx, (unsigned long *)&cause_mask))
> > 5 event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> > 6 }
> >
> > 4. The Root Cause:
> >
> > The hybrid macro expands and passes the event->pmu to hybrid_pmu:
> > arch/x86/events/perf_event.h#n788
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n788)
> >
> > 1 #define hybrid(_pmu, _field) \
> > 2 ...
> > 3 if (is_hybrid() && (_pmu)) \
> > 4 __Fp = &hybrid_pmu(_pmu)->_field; \
> >
> > Which subsequently results in a blind container_of on a non-x86 PMU
> > pointer: arch/x86/events/perf_event.h#n780
> > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/perf_event.h#n780)
> >
> > 1 static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
> > 2 {
> > 3 return container_of(pmu, struct x86_hybrid_pmu, pmu);
> > 4 }
> >
> > Conclusion
> >
> > If a user creates an event group led by an x86 ACR event but includes
> > a non-x86 sibling event (like a software event), the second traversal
> > in intel_pmu_hw_config will blindly pass the software PMU to
> > hybrid_pmu. Because container_of assumes the PMU is embedded inside an
> > x86_hybrid_pmu struct, the resulting pointer becomes invalid, leading
> > to memory corruption or an out-of-bounds access when attempting to
> > read the acr_cause_mask64 property.
> > ---
> > arch/x86/events/intel/core.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > index cf3a4fe06ff2..26e829b8a882 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -4844,8 +4844,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
> > intel_pmu_set_acr_caused_constr(leader, idx++, cause_mask);
> >
> > if (leader->nr_siblings) {
> > - for_each_sibling_event(sibling, leader)
> > - intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);
> > + for_each_sibling_event(sibling, leader) {

So, the fix appears functional, but my AI overlord is spotting more
issues with the ACR code that I'll relay below:

Since intel_pmu_hw_config() is called during perf_event_alloc() ->
perf_init_event() in sys_perf_event_open(), it appears sys_perf_event_open()
has not yet acquired the required ctx->mutex for the group leader's context.

Could a concurrent thread modify the sibling_list (e.g., by closing a
sibling's file descriptor, triggering perf_event_release_kernel() and
removing the sibling from the list) and cause list corruption or a
use-after-free?

Additionally, for_each_sibling_event() explicitly asserts
lockdep_assert_event_ctx(leader).

Will this predictably trigger a kernel warning when CONFIG_PROVE_LOCKING is
enabled?

> > + if (is_x86_event(sibling))
> > + intel_pmu_set_acr_caused_constr(sibling, idx++, cause_mask);

When adding a new event to an existing ACR group, the existing leader and
siblings might be actively scheduled on the PMU by another CPU.

Modifying their dynamic constraints via a non-atomic bitwise AND update
(event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64)) without
holding ctx->lock or using the PMU event update protocol might allow the
PMU scheduler to read a stale or torn dyn_constraint value.

Could this unsynchronized call to intel_pmu_set_acr_caused_constr()
result in a data race?

Also, cause_mask is defined as a local 8-byte u64 stack variable. Since
perf_event_validate_size() allows event groups to contain several hundred
siblings, a user could construct a group with over 64 x86 siblings where
attr.config2 == 0 to bypass the num constraints.
(Ian note: the config2 sounds off here, but if the events are disabled
I think you can put as many as you like in a group)

When idx exceeds 63, the call to intel_pmu_set_acr_caused_constr() executes
test_bit(idx, (unsigned long *)&cause_mask).

Since cause_mask is passed by value and its address is taken in the inline
function, this will read memory past the bounds of the 8-byte variable,
accessing adjacent kernel stack data.

Can an attacker observe the resulting scheduling success or failure to
reliably infer the bits of the adjacent kernel stack memory?

Thanks,
Ian

> > + }
> > }
> >
> > if (leader != event)
> > --
> > 2.53.0.851.ga537e3e6e9-goog
> >