Re: [PATCH 1/2] perf/x86/intel/uncore: Skip discovery table for offline dies

From: Mi, Dapeng

Date: Thu Oct 30 2025 - 02:23:15 EST



On 10/30/2025 12:38 PM, Chen, Zide wrote:
> On 10/29/2025 6:37 PM, Mi, Dapeng wrote:
>> On 10/30/2025 6:07 AM, Zide Chen wrote:
>>> This warning can be triggered if NUMA is disabled and the system
>>> boots with fewer CPUs than the number of CPUs in die 0.
>>>
>>> WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore]
>>>
>>> Currently, the discovery table continues to be parsed even if all CPUs
>>> in the associated die are offline. This can lead to an array overflow
>>> at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may
>>> trigger the warning above or cause other issues.
>>>
>>> Reported-by: Steve Wahl <steve.wahl@xxxxxxx>
>>> Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables")
>>> Signed-off-by: Zide Chen <zide.chen@xxxxxxxxx>
>>> ---
>>> arch/x86/events/intel/uncore.c | 4 ++++
>>> arch/x86/events/intel/uncore_discovery.c | 2 +-
>>> 2 files changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>>> index ee586eb714ec..5c3aeea5c78d 100644
>>> --- a/arch/x86/events/intel/uncore.c
>>> +++ b/arch/x86/events/intel/uncore.c
>>> @@ -1380,6 +1380,10 @@ static void uncore_pci_pmus_register(void)
>>>
>>> for (node = rb_first(type->boxes); node; node = rb_next(node)) {
>>> unit = rb_entry(node, struct intel_uncore_discovery_unit, node);
>>> +
>>> + if (WARN_ON(unit->die >= uncore_max_dies()))
>> Base on my understanding, it seems an valid situation which could happen.
>> If so, we'd better remove the WARN_on to avoid it mislead users. Thanks.
> Now, for invalid or offline die IDs, we skip parsing the discovery
> table, and no PMON units are expected to be inserted into the RB tree.
> Therefore, using WARN_ON() here seems appropriate.
>
> I put a WARN_ON() here because invalid die ID could cause array overflow.

Ok, it's good then. Thanks.


>
> >> + continue;
>>> +
>>> pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(unit->addr),
>>> UNCORE_DISCOVERY_PCI_BUS(unit->addr),
>>> UNCORE_DISCOVERY_PCI_DEVFN(unit->addr));
>>> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
>>> index 1bf6e4288577..d6aee12139f1 100644
>>> --- a/arch/x86/events/intel/uncore_discovery.c
>>> +++ b/arch/x86/events/intel/uncore_discovery.c
>>> @@ -388,7 +388,7 @@ static bool intel_uncore_has_discovery_tables_pci(int *ignore)
>>> (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP;
>>>
>>> die = get_device_die_id(dev);
>>> - if (die < 0)
>>> + if ((die < 0) || (die >= uncore_max_dies()))
>>> continue;
>>>
>>> parse_discovery_table(dev, die, bar_offset, &parsed, ignore);