[BUG] resctrl: using smp_processor_id() in preemptible code in __l3_mon_event_count() via mbm_handle_overflow() during CPU hotplug

From: Qinyun Tan

Date: Thu Jun 11 2026 - 03:18:58 EST

Hi all,

While stress-testing resctrl under heavy CPU hotplug on an AMD platform
with a DEBUG_PREEMPT/LOCKDEP kernel, I hit a recurring splat originating
from the MBM overflow handler. Analysis suggests it is a latent issue in
the generic fs/resctrl code (not AMD specific), and it is still present in
current mainline (the code is identical).

Environment:
- Kernel 6.6.x + CONFIG_DEBUG_PREEMPT=y, LOCKDEP=y, PROVE_LOCKING=y
- x86 platform with multiple L3 monitor domains, MBM enabled
- Trigger: continuous CPU online/offline storm while MBM monitoring runs

Splat (representative):

BUG: using smp_processor_id() in preemptible [00000000] code: kworker/225:1/3750
caller is __l3_mon_event_count+0x73/0xb70
CPU: 432 PID: 3750 Comm: kworker/225:1 ... +debug
Workqueue: events mbm_handle_overflow
Call Trace:
check_preemption_disabled+0xd1/0xe0
__l3_mon_event_count+0x73/0xb70
__mon_event_count+0x1c4/0x940
mbm_update_one_event+0xc2/0x300
mbm_handle_overflow+0x115/0x2f0
process_one_work+0x814/0x1790
worker_thread+0x726/0x1320
...

Note "kworker/225:1" running on CPU 432: the per-CPU worker bound to
CPU225 was unbound and migrated when CPU225 went offline.

Root cause:
- mbm_over is a per-domain delayed_work, scheduled via
schedule_delayed_work_on(d->mbm_work_cpu, ...), i.e. a bound work that
is expected to run on a CPU of domain @d.
- __l3_mon_event_count() reads per-CPU MBM MSRs and therefore relies on
running on a CPU of @d. It does:
int cpu = smp_processor_id();
...
if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
return -EINVAL;
- In the read-from-sysfs path this invariant is provided by
smp_call_function_any() (IPI, preemption disabled, runs on a domain
CPU). In the overflow path the invariant is provided only *implicitly*
by the work being a bound per-CPU kworker (is_percpu_thread() exempts
smp_processor_id()).
- When d->mbm_work_cpu goes offline, the workqueue unbinds the per-CPU
worker (is_percpu_thread() becomes false) and the pending/just-woken
work runs on a foreign CPU. The implicit invariant breaks:
(a) smp_processor_id() in preemptible context -> DEBUG_PREEMPT splat;
(b) the foreign CPU is not in d->hdr.cpu_mask -> cpumask_test_cpu()
fails -> the MBM read for that domain/tick is silently skipped.
cpus_read_lock() held in mbm_handle_overflow() does not help: the
unbind/migration happens before the lock is taken, and the lock
neither disables preemption nor prevents migration.

Impact (believed low / mostly harmless):
- Self-healing: at the end of each tick mbm_handle_overflow() re-picks an
online d->mbm_work_cpu and reschedules, so the next tick runs correctly.
- Production kernels (DEBUG_PREEMPT=n): no warning; at worst one missed
MBM update for the affected domain on the hotplug tick. No crash.
- No cross-domain corruption: @d is fixed via container_of(); the
cpumask_test_cpu() guard turns the foreign-CPU case into a skip, not a
bad read.
- So the practical damage is a transient one-tick MBM accounting gap plus
DEBUG_PREEMPT noise under hotplug; but the noise can mask other splats
in CI/debug kernels, which is why I am reporting it.

Reproduced reliably only by forcing this rare window at high frequency
(continuous hotplug + active MBM).

I do not have a good fix in mind. The read in __l3_mon_event_count()
fundamentally assumes it runs on a CPU of the domain, but during hotplug
the overflow work can be migrated off that CPU; neither the cpus_read_lock()
held here nor the existing cpumask_test_cpu() guard addresses the
preemptible-context use of smp_processor_id() itself.

I would appreciate your guidance on how this should best be addressed.

I can provide the full log and a reproducer on request.

Thanks,
Qinyun Tan