Re: [PATCH 1/2] cgroup/cpuset: record DL BW alloc CPU for attach rollback

From: Guopeng Zhang

Date: Sun Apr 19 2026 - 22:23:15 EST




在 2026/4/18 2:51, Waiman Long 写道:
> On 4/16/26 11:37 PM, Guopeng Zhang wrote:
>> cpuset_can_attach() allocates DL bandwidth only when migrating
>> deadline tasks to a disjoint CPU mask, but cpuset_cancel_attach()
>> rolls back based only on nr_migrate_dl_tasks. This makes the DL
>> bandwidth alloc/free paths asymmetric: rollback can call dl_bw_free()
>> even when no dl_bw_alloc() was done.
>>
>> Rollback also needs to undo the reservation against the same CPU/root
>> domain that was charged. Record the CPU used by dl_bw_alloc() and use
>> that state in cpuset_cancel_attach(). If no allocation happened,
>> dl_bw_cpu stays at -1 and rollback skips dl_bw_free(). If allocation
>> did happen, bandwidth is returned to the same CPU/root domain.
>>
>> Successful attach paths are unchanged. This only fixes failed attach
>> rollback accounting.
>>
>> Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@xxxxxxxxxx>
...
>
> The patch looks correct to me.
>
> Reviewed-by: Waiman Long <longman@xxxxxxxxxx>
Hi Waiman,

Thank you for the review and for the Reviewed-by.
>
> However, I have a DL bandwidth accounting question unrelated to this patch that I would like the scheduler people to clarify. The allocation of additional DL BW is based on the condition
>
>         if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
>
> IOW, additional DL BW will need to be allocated when the old and new cpuset doesn't overlap. However, they could still be in the same root domain. Does that mean we will be double counting it?
I think you are right to call this out. Looking at the
current logic, !cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)
does not obviously guarantee that the migration is crossing into a different
root domain. If the old and new cpusets are disjoint but still belong to the
same root domain, it does look possible that we reserve bandwidth on the
destination side without a corresponding subtraction from the source side.
I will try to reproduce that configuration and follow up with results.
>
> Looking from the other side, the root domain may have enough DL BW for the task migration, but the subset of CPUs in the cpuset itself may not have enough total DL BW to host all the DL tasks to be migrated, is that a problem?
my current understanding is that the DL bandwidth
accounting is done at root-domain granularity, not at arbitrary cpuset-subset
granularity. That also seems consistent with
Documentation/scheduler/sched-deadline.rst, which says that deadline tasks
cannot have a CPU affinity mask smaller than the root domain they are created
on, and that a restricted CPU set should be achieved by creating a restricted
root domain with cpuset.

So if a cpuset is only a subset inside a larger root domain, it does not seem
to get an independent DL bandwidth limit of its own. If that understanding is
correct, then the smaller cpuset not having enough bandwidth by itself would
be a limitation of that model rather than something this code checks
separately. I'd appreciate confirmation from the scheduler folks on that
point.

Thanks,
Guopeng
>
> Cheers,
> Longman