[PATCH] cgroup/cpuset: skip hardwall ancestor scan in v2 mode in cpuset_current_node_allowed()
From: Chen Wandun
Date: Fri May 08 2026 - 02:30:11 EST
In cgroup v2, the non-hardwall fallthrough path in
cpuset_current_node_allowed() always ends up allowing the allocation:
- CS_MEM_EXCLUSIVE and CS_MEM_HARDWALL are v1-only flags, toggled
only via the cpuset.mem_exclusive / cpuset.mem_hardwall files
which do not exist in v2. Neither flag is ever set on any cpuset
(including top_cpuset) in pure v2 mode.
- As a result, nearest_hardwall_ancestor() always walks up to
top_cpuset.
- top_cpuset.mems_allowed is set to node_possible_map in v2 mode,
so node_isset() on it is always true for any valid node.
The whole scan therefore boils down to taking callback_lock, walking
to the root and returning true. Short-circuit it by returning true
directly when is_in_v2_mode() holds, sparing the callback_lock
acquisition and the pointless walk.
Place the short-circuit after the __GFP_HARDWALL check so that the
generic hardwall enforcement for GFP_USER allocations remains in
effect: __GFP_HARDWALL requests still return false when the node is
outside mems_allowed, preserving cpuset.mems constraints for
__alloc_pages() callers (which prepare_alloc_pages() marks
__GFP_HARDWALL unconditionally when cpusets are enabled).
Suggested-by: Michal Koutný <mkoutny@xxxxxxxx>
Signed-off-by: Chen Wandun <chenwandun@xxxxxxxxxxx>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a48901a0416a..b539f5b4d21e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4231,6 +4231,9 @@ bool cpuset_current_node_allowed(int node, gfp_t gfp_mask)
if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
return false;
+ if (is_in_v2_mode())
+ return true;
+
/* Not hardwall and node outside mems_allowed: scan up cpusets */
spin_lock_irqsave(&callback_lock, flags);
--
2.43.0