Re: [PATCH] cgroup/cpuset: Support multiple source/destination cpusets using pids pattern
From: Waiman Long
Date: Fri Jun 05 2026 - 13:23:03 EST
On 6/5/26 3:35 AM, Ridong Chen wrote:
But we still need to track the set of source and destination cpusets to commit or cancel the change. Doing it task-by-task will add code in the cpuset_attach() and cpuset_cancel_attach() to check if a task is a DL task and act accordingly. So we are just trading task-by-task code with code to handle the lists.
On 6/4/2026 2:47 AM, Waiman Long wrote:
On 6/3/26 6:26 AM, Ridong Chen wrote:Good to hear that.
The current cpuset_can_attach() and cpuset_attach() functions assume taskIt is not a problem doing per-task DL BW allocation and eliminating the
migration is from one source cpuset to one destination cpuset. This
can be
wrong in several scenarios:
- Moving a multi-threaded process with threads in different cpusets
- Disabling the cpuset controller (many children to one parent)
- Enabling the cpuset controller (one parent to many children)
Fix this by adopting the pids subsystem's per-task accounting pattern.
In cpuset_can_attach(), use task_cs(task) to get the correct source
cpuset
for each task (like pids_can_attach uses task_css), adjust
nr_deadline_tasks
and reserve DL bandwidth per-task, and increment attach_in_progress
per-task
on the destination cpuset. In cpuset_attach(), handle destination cpuset
changes within the task iteration loop.
A shared helper cpuset_undo_attach() reverses the per-task operations for
both partial rollback in cpuset_can_attach() and full reversal in
cpuset_cancel_attach().
When multiple source cpusets are detected in can_attach(), set
attach_many_sources so that cpuset_attach() forces cpus_updated and
mems_updated to true, ensuring all tasks get properly updated regardless
of which source cpuset cpuset_attach_old_cs points to.
This eliminates the need for nr_migrate_dl_tasks, sum_migrate_dl_bw, and
dl_bw_cpu fields in struct cpuset.
Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default
hierarchy")
Signed-off-by: Ridong Chen <ridong.chen@xxxxxxxxx>
*dl_bw* fields. However, updating nr_deadline_tasks before it is
committed can be problematic.
nr_deadline_tasks is used in dl_rebuild_rd_accounting() which is calledWe can keep the nr_migrate_dl_tasks field and update nr_deadline_tasks
by partition_sched_domains_locked(). After the release of cpuset_mutex
at the end of cpuset_can_attach() and before cpuset_attach() or
cpuset_cancel_attach() is called, it is possible
that partition_sched_domains_locked() can be called
and dl_rebuild_rd_accounting() is not getting the right DL BW accounting
information. So unless there is a way to confirm that this situation
cannot happen, we can't change nr_deadline_tasks before the attach is
commited.
once migration is complete. I think this will be much simpler than
fixing the issue using lists.
Cheers,
Longman