Re: [PATCH-next v5 6/6] cgroup/cpuset: Support multiple source/destination cpusets for cpuset_*attach()
From: Waiman Long
Date: Wed Jun 24 2026 - 19:06:25 EST
On 6/24/26 11:45 AM, Michal Koutný wrote:
Hello Waiman.
On Mon, Jun 01, 2026 at 10:32:03PM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:
This problem is less an issue when enabling the cpuset controller as allWhen I generalize that it can be an issue for any threaded controller
the newly created child cpusets will have exactly the same set of CPUs
and memory nodes except when deadline tasks are involved in migration
as the deadline task accounting data can be off.
It can be more problematic when the cpuset controller is disabled as
their set of CPUs and memory nodes may differ from their parent or with
the moving of multi-threaded process from different threaded cgroups.
that somehow relies on the _difference_ between old and new thread
membership.
So I checked some: pids and perf_events look alright (no
diff-dependency) but I noticed the very same issue is tackled in
sched_change_group/scx_cgroup_move_task and that there is a member
inside task_struct allocated for this state tracking already:
task_struct::scx::cgrp_moving_from
Fix that by tracking the set of source (old) and destination cpusetsSo there would be more than a single use for something conceptually
in singly linked lists and iterating them all to properly update the
internal data. Also keep the current cs and oldcs variables up-to-date
with the css and task iterators.
like:
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 004e6d56a499a..740c02f220c75 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1326,6 +1326,9 @@ struct task_struct {
#ifdef CONFIG_PREEMPT_RT
struct llist_node cg_dead_lnode;
#endif /* CONFIG_PREEMPT_RT */
+#ifdef CONFIG_CGROUPS_MOVING_FROM
+ struct cgroup *cgrp_moving_from;
+#endif
#endif /* CONFIG_CGROUPS */
#ifdef CONFIG_X86_CPU_RESCTRL
u32 closid;
diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index 1a3af2ea2a794..5b63afe83f333 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -240,9 +240,6 @@ struct sched_ext_entity {
bool disallow; /* reject switching into SCX */
/* cold fields */
-#ifdef CONFIG_EXT_GROUP_SCHED
- struct cgroup *cgrp_moving_from;
-#endif
struct list_head tasks_node;
};
diff --git a/init/Kconfig b/init/Kconfig
index 2937c4d308aec..d7e7d4477f862 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1186,6 +1186,7 @@ config EXT_GROUP_SCHED
depends on SCHED_CLASS_EXT && CGROUP_SCHED
select GROUP_SCHED_WEIGHT
select GROUP_SCHED_BANDWIDTH
+ select CGROUPS_MOVING_FROM
default y
endif #CGROUP_SCHED
@@ -1288,6 +1289,7 @@ config CPUSETS
depends on SMP
select UNION_FIND
select CPU_ISOLATION
+ select CGROUPS_MOVING_FROM
help
This option will let you create and manage CPUSETs which
allow dynamically partitioning a system into sets of CPUs and
I think this could simplify the before-after state tracking generally,
WDYT?
I had actually introduced a new task_struct field in an early version to track the old cpuset to handle memory migration. However, Chen Ridong had shown me that we may not really need such granular detail. So I drop it in the newer versions. Also sharing a common field between cpuset and sched_ext can introduce complication as we have to make sure that we won't step into each other.
Thank for the suggestion anyway and I will reconsider it in case it is found that we really need such information to do the right thing.
Cheers,
Longman