Re: [PATCH v4 16/20] sched/core: Introduce default arch handling code for inc/dec preferred CPUs

From: Yury Norov

Date: Thu Jun 18 2026 - 00:15:20 EST


On Wed, Jun 17, 2026 at 11:11:35PM +0530, Shrikanth Hegde wrote:
> Define default handlers for high/low steal time. If arch has better
> decision logic, may override the default implementation.
>
> - If the steal time higher than threshold, reduce the number of preferred
> CPUs by 1 core. The last core in the intersection of active and
> preferred CPUs will be marked as non-preferred.
> Ensure at least one core is left as preferred always.
>
> - If the steal time lower than threshold, increase the number of preferred
> CPUs by 1 core. First active core which is not in cpu_preferred_mask will
> be marked as preferred.
> If all cores are already set to preferred, bail out.

And the code below does nothing of that.

> Increase/Decrease may need to modify the splicing across NUMA nodes. It is
> being kept simple for now.
>
> Signed-off-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxx>
> ---
> v3->v4:
> - active instead of online
> - added comment for enabling tick for nohz_full.
>
> include/linux/sched.h | 2 ++
> kernel/sched/core.c | 61 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5b15353ed7ef..e435f3073ffc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2529,5 +2529,7 @@ struct steal_monitor_t {
> };
>
> extern struct steal_monitor_t steal_mon;
> +void arch_dec_preferred_cpus(struct steal_monitor_t *sm, u64 steal_ratio);
> +void arch_inc_preferred_cpus(struct steal_monitor_t *sm, u64 steal_ratio);
> #endif
> #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f1a91021e357..c77045055604 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -11400,6 +11400,67 @@ void sched_init_steal_monitor(void)
> steal_mon.sampling_period_ms = 1000; /* once per second */
> }
>
> +/*
> + * Default implementation of decrementing the preferred CPUs based on steal
> + * time. This is simple logic and decrease the preferred CPUs by 1 core.
> + * It takes out the last core in the active & preferred.
> + *
> + * Ensure at least one housekeeping core is always kept as preferred
> + *
> + * Could be overwritten by arch specific handling.
> + */
> +#ifndef arch_dec_preferred_cpus
> +void arch_dec_preferred_cpus(struct steal_monitor_t *sm, u64 steal_ratio)
> +{
> + int last_cpu, tmp_cpu;
> + int this_cpu = raw_smp_processor_id();
> +
> + last_cpu = cpumask_last(cpu_preferred_mask);
> +
> + /*
> + * If the core belongs to the housekeeping CPUs, no action is
> + * taken. This leaves at least one core preferred always.
> + * This ensures at least some CPUs are available to run
> + */
> + if (cpumask_equal(cpu_smt_mask(last_cpu), cpu_smt_mask(this_cpu)))
> + return;
> +
> + /*
> + * set tick bit for nohz_full CPU to push the task out. Once the tasks
> + * are pushed out, bit will be cleared
> + */
> + for_each_cpu_and(tmp_cpu, cpu_smt_mask(last_cpu), cpu_active_mask) {
> + set_cpu_preferred(tmp_cpu, false);
> + if (tick_nohz_full_cpu(tmp_cpu))
> + tick_nohz_dep_set_cpu(tmp_cpu, TICK_DEP_BIT_SCHED);
> + }
> +}
> +#endif
> +
> +/*
> + * Default implementation of incrementing preferred CPUs based on steal
> + * time. This is simple logic and increases the preferred CPUs by 1 core.
> + * It adds the first core in active & !preferred
> + *
> + * Nothing to do if active == preferred
> + *
> + * Could be overwritten by arch specific handling.
> + */
> +#ifndef arch_inc_preferred_cpus
> +void arch_inc_preferred_cpus(struct steal_monitor_t *sm, u64 steal_ratio)
> +{
> + int first_cpu, tmp_cpu;
> +
> + first_cpu = cpumask_first_andnot(cpu_active_mask, cpu_preferred_mask);
> + /* All CPUs are preferred. Nothing to increase further */
> + if (first_cpu >= nr_cpu_ids)
> + return;
> +
> + for_each_cpu_and(tmp_cpu, cpu_smt_mask(first_cpu), cpu_active_mask)
> + set_cpu_preferred(tmp_cpu, true);
> +}
> +#endif
> +
> /* This is only a skeleton. Subsequent patches introduce more of it */
> void sched_steal_detection_work(struct work_struct *work)
> {
> --
> 2.47.3