Re: [PATCH 1/2] cgroup/cpuset: record DL BW alloc CPU for attach rollback

From: Waiman Long

Date: Fri Apr 17 2026 - 14:52:24 EST


On 4/16/26 11:37 PM, Guopeng Zhang wrote:
cpuset_can_attach() allocates DL bandwidth only when migrating
deadline tasks to a disjoint CPU mask, but cpuset_cancel_attach()
rolls back based only on nr_migrate_dl_tasks. This makes the DL
bandwidth alloc/free paths asymmetric: rollback can call dl_bw_free()
even when no dl_bw_alloc() was done.

Rollback also needs to undo the reservation against the same CPU/root
domain that was charged. Record the CPU used by dl_bw_alloc() and use
that state in cpuset_cancel_attach(). If no allocation happened,
dl_bw_cpu stays at -1 and rollback skips dl_bw_free(). If allocation
did happen, bandwidth is returned to the same CPU/root domain.

Successful attach paths are unchanged. This only fixes failed attach
rollback accounting.

Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
Signed-off-by: Guopeng Zhang <zhangguopeng@xxxxxxxxxx>
---
kernel/cgroup/cpuset-internal.h | 5 +++++
kernel/cgroup/cpuset.c | 13 +++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index fd7d19842ded..bb4e692bea30 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -168,6 +168,11 @@ struct cpuset {
int nr_deadline_tasks;
int nr_migrate_dl_tasks;
u64 sum_migrate_dl_bw;
+ /*
+ * CPU used for temporary DL bandwidth allocation during attach;
+ * -1 if no DL bandwidth was allocated in the current attach.
+ */
+ int dl_bw_cpu;
/* Invalid partition error code, not lock protected */
enum prs_errcode prs_err;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1335e437098e..e3a081a07c6d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -288,6 +288,7 @@ struct cpuset top_cpuset = {
.flags = BIT(CS_CPU_EXCLUSIVE) |
BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
.partition_root_state = PRS_ROOT,
+ .dl_bw_cpu = -1,
};
/**
@@ -579,6 +580,8 @@ static struct cpuset *dup_or_alloc_cpuset(struct cpuset *cs)
if (!trial)
return NULL;
+ trial->dl_bw_cpu = -1;
+
/* Setup cpumask pointer array */
cpumask_var_t *pmask[4] = {
&trial->cpus_allowed,
@@ -2980,6 +2983,7 @@ static void reset_migrate_dl_data(struct cpuset *cs)
{
cs->nr_migrate_dl_tasks = 0;
cs->sum_migrate_dl_bw = 0;
+ cs->dl_bw_cpu = -1;
}
/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
@@ -3056,6 +3060,8 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
reset_migrate_dl_data(cs);
goto out_unlock;
}
+
+ cs->dl_bw_cpu = cpu;
}
out_success:
@@ -3080,12 +3086,11 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset)
mutex_lock(&cpuset_mutex);
dec_attach_in_progress_locked(cs);
- if (cs->nr_migrate_dl_tasks) {
- int cpu = cpumask_any(cs->effective_cpus);
+ if (cs->dl_bw_cpu >= 0)
+ dl_bw_free(cs->dl_bw_cpu, cs->sum_migrate_dl_bw);
- dl_bw_free(cpu, cs->sum_migrate_dl_bw);
+ if (cs->nr_migrate_dl_tasks)
reset_migrate_dl_data(cs);
- }
mutex_unlock(&cpuset_mutex);
}

The patch looks correct to me.

Reviewed-by: Waiman Long <longman@xxxxxxxxxx>

However, I have a DL bandwidth accounting question unrelated to this patch that I would like the scheduler people to clarify. The allocation of additional DL BW is based on the condition

        if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {

IOW, additional DL BW will need to be allocated when the old and new cpuset doesn't overlap. However, they could still be in the same root domain. Does that mean we will be double counting it?

Looking from the other side, the root domain may have enough DL BW for the task migration, but the subset of CPUs in the cpuset itself may not have enough total DL BW to host all the DL tasks to be migrated, is that a problem?

Cheers,
Longman