Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()

From: Luck, Tony

Date: Wed May 06 2026 - 15:49:11 EST

On Wed, May 06, 2026 at 11:24:30AM -0700, Reinette Chatre wrote:

... trimmed discussion on how we got here ...

> schedule_delayed_work_on() will schedule the work but will do so on CPU going
> offline. Does not seem as though schedule_delayed_work_on() should be used at all
> if the worker is currently running. As an alternative, when it finds that it cannot
> cancel the work resctrl can avoid attempting to reschedule the work and instead just
> set rdt_l3_mon_domain::mbm_work_cpu to nr_cpu_ids to signal that this domain needs a
> worker to be scheduled and that to be done by the exiting work.
>
> Combining the previous ideas with the results from experiments I think the following
> may address the problem for MBM overflow handler, not expanded to include limbo handler
> and untested:

Initial testing seems good. I added a big mdelay() in mbm_handle_overflow()
before cpus_read_lock() to make it easy to hit the case where cancel_delayed_work()
fails. Tested both the "still have remaining CPUs in the domain" and "this is
last cpu" case for both success and fail of cancel_delayed_work().

It looks to me that resctrl_offline_cpu() handles this completely and
the additional cancel_delayed_work() calls from resctrl_offline_mon_domain()
aren't needed.

Do you agree that those can be deleted?

I'll look at fixing the cqm_limbo path in the same style.

>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 9fd901c78dc6..2e54042b7ee9 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -852,6 +852,30 @@ void mbm_handle_overflow(struct work_struct *work)
> goto out_unlock;
>
> r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> +
> + /*
> + * Worker was blocked waiting for the CPU it was running on to go
> + * offline. Handle two scenarios:
> + * - Worker was running on the last CPU of a domain. The domain and
> + * thus the work_struct has been freed so do not attempt to obtain
> + * domain via container_of(). All remaining domains have overflow
> + * handlers so the loop will not find any domains needing an
> + * overflow handler. Just exit.
> + * - Worker was running on CPU that just went offline with other
> + * CPUs in domain still running and available to take over the
> + * worker. Offline handler could not schedule a new worker on
> + * another CPU in the domain but signaled that this needs to be
> + * done by setting mbm_work_cpu to nr_cpu_ids. Find the domain
> + * that needs a worker and schedule it now.
> + */
> + if (!is_percpu_thread()) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + if (d->mbm_work_cpu == nr_cpu_ids)
> + mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL, RESCTRL_PICK_ANY_CPU);
> + }
> + goto out_unlock;
> + }
> +
> d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
>
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 02f87c4bc03c..cc8620ace7ed 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4539,8 +4539,19 @@ void resctrl_offline_cpu(unsigned int cpu)
> d = get_mon_domain_from_cpu(cpu, l3);
> if (d) {
> if (resctrl_is_mbm_enabled() && cpu == d->mbm_work_cpu) {
> - cancel_delayed_work(&d->mbm_over);
> - mbm_setup_overflow_handler(d, 0, cpu);
> + if (cancel_delayed_work(&d->mbm_over)) {
> + mbm_setup_overflow_handler(d, 0, cpu);
> + } else {
> + /*
> + * Unable to schedule work on new CPU if it
> + * is currently running since the re-schedule
> + * will just force new work to run on
> + * current CPU. Mark domain's worker as
> + * needing to be rescheduled to be handled
> + * by worker itself.
> + */
> + d->mbm_work_cpu = nr_cpu_ids;
> + }
> }
> if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
> cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
>
>

-Tony