Re: [PATCH v2 03/17] cpumask: Introduce cpu_preferred_mask
From: Shrikanth Hegde
Date: Wed Apr 08 2026 - 05:16:46 EST
Hi Yury. Thanks for going through the series.
On 4/8/26 1:57 AM, Yury Norov wrote:
On Wed, Apr 08, 2026 at 12:49:36AM +0530, Shrikanth Hegde wrote:
This patch does
- Declare and Define cpu_preferred_mask.
- Get/Set helpers for it.
Values are set/clear by the scheduler by detecting the steal time values.
A CPU is set to preferred when it comes online. Later it may be
marked as non-preferred depending on steal time values with
STEAL_MONITOR enabled.
Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
---
include/linux/cpumask.h | 22 ++++++++++++++++++++++
kernel/cpu.c | 6 ++++++
kernel/sched/core.c | 5 +++++
3 files changed, 33 insertions(+)
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 80211900f373..80c5cc13b8ad 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -1296,6 +1296,28 @@ static __always_inline bool cpu_dying(unsigned int cpu)
#endif /* NR_CPUS > 1 */
+/*
+ * All related wrappers kept together to avoid too many ifdefs
+ * See Documentation/scheduler/sched-arch.rst for details
+ */
+#ifdef CONFIG_PARAVIRT
+extern struct cpumask __cpu_preferred_mask;
+#define cpu_preferred_mask ((const struct cpumask *)&__cpu_preferred_mask)
+#define set_cpu_preferred(cpu, preferred) assign_cpu((cpu), &__cpu_preferred_mask, (preferred))
+
+static __always_inline bool cpu_preferred(unsigned int cpu)
+{
+ return cpumask_test_cpu(cpu, cpu_preferred_mask);
+}
+#else
+static __always_inline bool cpu_preferred(unsigned int cpu)
+{
+ return true;
+}
This doesn't look consistent, probably not correct. What if
I pass an offline CPU here? Is it still preferred?
preferred cpu state follows the online state. This was done by change
below in set_cpu_online. So when cpu goes offline, it will be removed from
the preferred mask too.
In the design principle I wanted, preferred to be always subset of online
preferred <= online <= possible.
Later you say that preferred CPU is online + STEAL-approved one.
So in non-paravirtualized case, I believe, you should consider
There it would clearly be same as online CPUs.
that only online CPUs are preferred. What about dying CPUs? Can
they be preferred too?
When there is no CPU hotplug, preferred will be subset of online.
Lets see different cases with CPU hotplug.
when STEAL_MONITOR is on and there is high steal time.
Lets say, 600 CPUs system with SMT.
Case 1:
CPU 500 was offline. It would have it's preferred bit=0 . after a while
there was high steal time, and preferred_cpus = <0-399> and once the contention
was gone, since it is using cpu_smt_mask, it would set 500's preferred bit=1, though
it is offline.
Case 2:
all online CPUs were preferred. 500 was offline. after a while there was
high steal and while iterating through cpu_smt_mask, after say 499 was done,
500 is brought online. that would set it in preferred.
Since it was part of the mask, 500 will be marked preferred=0.
That's ok. It was meant to be anyway.
Case 3:
all online CPUs were preferred. 500 was offline. after a while there was high steal
and preferred_cpus = <0-399> and 500 is brought online. that would set it
in preferred. In the next cycle, bringing online causes more steal time, and since it is
the last CPU in the mask, it will be marked as non-preferred. Thats ok.
So Case 1 is the one where the construct is broken.
This is solvable by checking the online state in steal time handling code.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d3b2bcb6008c..bad091f1f604 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11329,7 +11329,7 @@ void sched_steal_detection_work(struct work_struct *work)
if (cpumask_equal(cpu_smt_mask(last_cpu), cpu_smt_mask(this_cpu)))
return;
- for_each_cpu(tmp_cpu, cpu_smt_mask(last_cpu)) {
+ for_each_cpu_and(tmp_cpu, cpu_smt_mask(last_cpu), cpu_online_mask) {
set_cpu_preferred(tmp_cpu, false);
if (tick_nohz_full_cpu(tmp_cpu))
tick_nohz_dep_set_cpu(tmp_cpu, TICK_DEP_BIT_SCHED);
@@ -11345,7 +11345,7 @@ void sched_steal_detection_work(struct work_struct *work)
if (first_cpu >= nr_cpu_ids)
return;
- for_each_cpu(tmp_cpu, cpu_smt_mask(first_cpu))
+ for_each_cpu_and(tmp_cpu, cpu_smt_mask(first_cpu), cpu_online_mask)
set_cpu_preferred(tmp_cpu, true);
}
I had thought of this scenario. I hadn't seen it from consistency point of
view. It should be consistent since it is exposed to user.
Functionality wise it was okay since, current code has enough checks to
schedule only on online CPUs. Even is_cpu_allowed returns true only
if it is online. But i get the point, and above diff should address it.
At least, please run cpumask_check() on the argument.
It is set either within online or in PATCH 15/17 by iterating through
cpu_smt_mask. That should always yeild cpu < nr_cpu_ids.
I didn't get why cpumask_check is needed again.
There's a top-comment describing all the system cpumasks. Except for
cpu_dying, it's nice and complete. Can you describe your new creature
there?
Ok. I can add a comment there.
Finally, I don't think that __cpu_preferred_mask should depend on
PARAVIRT config. Consider cpu_present_mask. It mirrors cpu_possible_mask
if hotplug is disabled, but it's still a real mask even in that case.
The way you're doing it, you spread CONFIG_PARAVIRT ifdefery pretty
much anywhere where people might want to use this new mask for anything
except for testing a bit.
One concern you had raised earlier was bloating of the code for systems
CONFIG_PARAVIRT=n.
Maybe in some of the hotpaths we could do, IS_ENABLED(CONFIG_PARAVIRT) check and
that should be ok?
If so, we can get rid off lot of this ifdefery.
cpu_preferred(cpu) is a bit check and shouldn't that expensive.
Thanks,
Yury
+static __always_inline void set_cpu_preferred(unsigned int cpu, bool preferred) { }
+#endif
+
#define cpu_is_offline(cpu) unlikely(!cpu_online(cpu))
#if NR_CPUS <= BITS_PER_LONG
diff --git a/kernel/cpu.c b/kernel/cpu.c
index bc4f7a9ba64e..2d4d037680d4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -3137,6 +3137,12 @@ void set_cpu_online(unsigned int cpu, bool online)
if (cpumask_test_and_clear_cpu(cpu, &__cpu_online_mask))
atomic_dec(&__num_online_cpus);
}
+
+ /*
+ * An online CPU is by default assumed to be preferred
+ * Unitl STEAL_MONITOR changes it
+ */
+ set_cpu_preferred(cpu, online);
}
Here, preferred follows the online state.
/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f351296922ac..7ea05a7a717b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11228,3 +11228,8 @@ void sched_change_end(struct sched_change_ctx *ctx)
p->sched_class->prio_changed(rq, p, ctx->prio);
}
}
+
+#ifdef CONFIG_PARAVIRT
+struct cpumask __cpu_preferred_mask __read_mostly;
+EXPORT_SYMBOL(__cpu_preferred_mask);
+#endif
--
2.47.3