Re: [BUG] resctrl: using smp_processor_id() in preemptible code in __l3_mon_event_count() via mbm_handle_overflow() during CPU hotplug
From: Reinette Chatre
Date: Thu Jun 11 2026 - 11:22:37 EST
Thanks to Qinyun Tan for doing this stress testing and creating this detailed report.
On 6/11/26 8:02 AM, Luck, Tony wrote:
>> I do not have a good fix in mind. The read in __l3_mon_event_count()
>> fundamentally assumes it runs on a CPU of the domain, but during hotplug
>> the overflow work can be migrated off that CPU; neither the cpus_read_lock()
>> held here nor the existing cpumask_test_cpu() guard addresses the
>> preemptible-context use of smp_processor_id() itself.
>>
>> I would appreciate your guidance on how this should best be addressed.
>>
>> I can provide the full log and a reproducer on request.
>>
>
> Qinyun Tan,
>
> I think this is addressed by this pending patch:
>
> https://lore.kernel.org/all/b5178a191a8a660e1f4aed356484d4eebfbd30fc.1781029125.git.reinette.chatre@xxxxxxxxx/
>
> [At least the scenario seems similar with CPU offline and subsequent unbound run of a worker]
Indeed. That patch modifies the resctrl CPU offline handler to wait out any existing
work. Considering that the resctrl offline handler runs before the workqueue offline
handler I thus expect that this change would ensure the work completes on the CPU
going offline and there would be no work left for the workqueue offline handler to
migrate to another CPU.
This same patch also adds an additional protection within the worker against this scenario
happening by ensuring that when the worker runs it is still a "per CPU thread" so that
it can be assured that once it does start running, smp_processor_id() can be used safely.
Reinette