Re: [PATCH/for-next v4 1/4] cgroup/cpuset: Clarify exclusion rules for cpuset internal variables
From: Waiman Long
Date: Mon Feb 09 2026 - 14:58:13 EST
On 2/8/26 10:41 PM, Chen Ridong wrote:
On 2026/2/7 4:37, Waiman Long wrote:
Clarify the locking rules associated with file level internal variablesDoes this mean that variables can be read or written only by holding
inside the cpuset code. There is no functional change.
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/cgroup/cpuset.c | 105 ++++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 44 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index c43efef7df71..a4c6386a594d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -61,6 +61,58 @@ static const char * const perr_strings[] = {
[PERR_REMOTE] = "Have remote partition underneath",
};
+/*
+ * CPUSET Locking Convention
+ * -------------------------
+ *
+ * Below are the three global locks guarding cpuset structures in lock
+ * acquisition order:
+ * - cpu_hotplug_lock (cpus_read_lock/cpus_write_lock)
+ * - cpuset_mutex
+ * - callback_lock (raw spinlock)
+ *
+ * A task must hold all the three locks to modify externally visible or
+ * used fields of cpusets, though some of the internally used cpuset fields
+ * and internal variables can be modified without holding callback_lock. If only
+ * reliable read access of the externally used fields are needed, a task can
+ * hold either cpuset_mutex or callback_lock which are exposed to other
+ * external subsystems.
+ *
+ * If a task holds cpu_hotplug_lock and cpuset_mutex, it blocks others,
+ * ensuring that it is the only task able to also acquire callback_lock and
+ * be able to modify cpusets. It can perform various checks on the cpuset
+ * structure first, knowing nothing will change. It can also allocate memory
+ * without holding callback_lock. While it is performing these checks, various
+ * callback routines can briefly acquire callback_lock to query cpusets. Once
+ * it is ready to make the changes, it takes callback_lock, blocking everyone
+ * else.
+ *
+ * Calls to the kernel memory allocator cannot be made while holding
+ * callback_lock which is a spinlock, as the memory allocator may sleep or
+ * call back into cpuset code and acquire callback_lock.
+ *
+ * Now, the task_struct fields mems_allowed and mempolicy may be changed
+ * by other task, we use alloc_lock in the task_struct fields to protect
+ * them.
+ *
+ * The cpuset_common_seq_show() handlers only hold callback_lock across
+ * small pieces of code, such as when reading out possibly multi-word
+ * cpumasks and nodemasks.
+ */
+
+static DEFINE_MUTEX(cpuset_mutex);
+
+/*
+ * File level internal variables below follow one of the following exclusion
+ * rules.
+ *
+ * RWCS: Read/write-able by holding either cpus_write_lock or both
+ * cpus_read_lock and cpuset_mutex.
+ *
cpus_write_lock?
I believe that to write cpuset variables, we must hold either (cpus_write_lock
and cpuset_mutex) or (cpus_read_lock and cpuset_mutex).
The importance of the locking rule is to emphasize the condition for mutual exclusion. Once cpus_write_lock is held, no other task can hold cpus_read_lock and cpuset_mutex. I will consider holding cpuset_mutex as optional, though almost all the cpuset internal variables are accessed from the CPU hotplug side with both cpus_write_lock and cpuset_mutex held. The only exception is force_sd_rebuild (sd_rebuild) that can be set directly from the scheduling code without holding cpuset_mtuex. I can change it to "holding cpus_write_lock (and optionally cpuset_mutex) or both cpus_read_lock and cpuset_mutex" if that makes it clearer.
Cheers,
Longman