[PATCH 4/4] fs/resctrl: Fix issues with worker threads when CPUs are taken offline

From: Tony Luck

Date: Fri May 08 2026 - 14:22:57 EST

From: Reinette Chatre <reinette.chatre@xxxxxxxxx>

Sashiko noticed[1] a user-after-free in the resctrl worker thread code
where the rdt_l3_mon_domain structure was freed while the worker was blocked
waiting for locks.

The root issue is that cancel_delayed_work() does not block in the case where
the worker thread is executing. This results in the race that Sashiko noticed,
but also causes problems when the CPU that has been chosen to service the
worker thread is taken offline.

Note that worker threads are allowed to delete their own work_struct
(see comment in kernel/workqueue.c:process_one_work()) so there can't be
any problems on the return path from the worker in this case where the
work_struct was deleted by other code while the worker was executing.

Indicate failure of cancel_delayed_work() calls in resctrl_offline_cpu()
by setting d->mbm_work_cpu or d->cqm_work_cpu to nr_cpu_ids. Make the worker
threads check to see if they are no longer bound to the right CPU. In this
case search the L3 domain list for any domain(s) with the work cpu set to
nr_cpu_ids. In the case where the last CPU was removed from a domain, the
domain has been removed from the list and there is nothing to do. If the
domain still exists, then restart the worker on any of the remaining CPUs.

Remove redundant cancel_delayed_work() calls from resctrl_offline_mon_domain().

Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Co-developed-by: Tony Luck <tony.luck@xxxxxxxxx>
Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
---
fs/resctrl/monitor.c | 55 +++++++++++++++++++++++++++++++++++++++++++
fs/resctrl/rdtgroup.c | 27 +++++++++++++++------
2 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..02434d11e024 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -791,12 +791,38 @@ static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
*/
void cqm_handle_limbo(struct work_struct *work)
{
+ struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
struct rdt_l3_mon_domain *d;

cpus_read_lock();
mutex_lock(&rdtgroup_mutex);

+ /*
+ * Worker was blocked waiting for the CPU it was running on to go
+ * offline. Handle two scenarios:
+ * - Worker was running on the last CPU of a domain. The domain and
+ * thus the work_struct has been freed so do not attempt to obtain
+ * domain via container_of(). All remaining domains have limbo
+ * handlers so the loop will not find any domains needing a
+ * limbo handler. Just exit.
+ * - Worker was running on CPU that just went offline with other
+ * CPUs in domain still running and available to take over the
+ * worker. Offline handler could not schedule a new worker on
+ * another CPU in the domain but signaled that this needs to be
+ * done by setting mbm_work_cpu to nr_cpu_ids. Find the domain
+ * that needs a worker and schedule it after the normal CQM
+ * interval.
+ */
+ if (!is_percpu_thread()) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->cqm_work_cpu == nr_cpu_ids)
+ cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL,
+ RESCTRL_PICK_ANY_CPU);
+ }
+ goto out_unlock;
+ }
+
d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);

__check_limbo(d, false);
@@ -808,6 +834,7 @@ void cqm_handle_limbo(struct work_struct *work)
delay);
}

+out_unlock:
mutex_unlock(&rdtgroup_mutex);
cpus_read_unlock();
}
@@ -852,6 +879,34 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;

r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+
+ /*
+ * Worker was blocked waiting for the CPU it was running on to go
+ * offline. Handle two scenarios:
+ * - Worker was running on the last CPU of a domain. The domain and
+ * thus the work_struct has been freed so do not attempt to obtain
+ * domain via container_of(). All remaining domains have overflow
+ * handlers so the loop will not find any domains needing an
+ * overflow handler. Just exit.
+ * - Worker was running on CPU that just went offline with other
+ * CPUs in domain still running and available to take over the
+ * worker. Offline handler could not schedule a new worker on
+ * another CPU in the domain but signaled that this needs to be
+ * done by setting mbm_work_cpu to nr_cpu_ids. Find the domain
+ * that needs a worker and schedule it to run after the normal
+ * MBM interval. This is completely safe on CPUs with wide MBM
+ * counters. Likely OK for old CPUs with narrow counters as the
+ * MBM_OVERFLOW_INTERVAL was picked conservatively.
+ */
+ if (!is_percpu_thread()) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ if (d->mbm_work_cpu == nr_cpu_ids)
+ mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL,
+ RESCTRL_PICK_ANY_CPU);
+ }
+ goto out_unlock;
+ }
+
d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);

list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 62e1e4c30f78..bab9afd5066e 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4343,8 +4343,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
goto out_unlock;

d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
- if (resctrl_is_mbm_enabled())
- cancel_delayed_work(&d->mbm_over);
+
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
/*
* When a package is going down, forcefully
@@ -4355,7 +4354,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
* package never comes back.
*/
__check_limbo(d, true);
- cancel_delayed_work(&d->cqm_limbo);
}

domain_destroy_l3_mon_state(d);
@@ -4536,13 +4534,28 @@ void resctrl_offline_cpu(unsigned int cpu)
d = get_mon_domain_from_cpu(cpu, l3);
if (d) {
if (resctrl_is_mbm_enabled() && cpu == d->mbm_work_cpu) {
- cancel_delayed_work(&d->mbm_over);
- mbm_setup_overflow_handler(d, 0, cpu);
+ if (cancel_delayed_work(&d->mbm_over)) {
+ mbm_setup_overflow_handler(d, 0, cpu);
+ } else {
+ /*
+ * Unable to schedule work on new CPU if it
+ * is currently running since the re-schedule
+ * will just force new work to run on
+ * current CPU. Mark domain's worker as
+ * needing to be rescheduled to be handled
+ * by worker itself.
+ */
+ d->mbm_work_cpu = nr_cpu_ids;
+ }
}
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
- cancel_delayed_work(&d->cqm_limbo);
- cqm_setup_limbo_handler(d, 0, cpu);
+ if (cancel_delayed_work(&d->cqm_limbo)) {
+ cqm_setup_limbo_handler(d, 0, cpu);
+ } else {
+ /* Same as mbm_work_cpu case above */
+ d->cqm_work_cpu = nr_cpu_ids;
+ }
}
}

--
2.54.0