Re: [RFC PATCH 5/5] sched/fair: Proactive idle balance using push mechanism

From: K Prateek Nayak
Date: Thu Apr 10 2025 - 11:38:10 EST


On 4/10/2025 3:59 PM, Peter Zijlstra wrote:

[..snip..]

/*
* See if the non running fair tasks on this rq can be sent on other CPUs
* that fits better with their profile.
*/
static bool push_fair_task(struct rq *rq)
{
+ struct cpumask *cpus = this_cpu_cpumask_var_ptr(load_balance_mask);
+ struct task_struct *p = pick_next_pushable_fair_task(rq);
+ int cpu, this_cpu = cpu_of(rq);
+
+ if (!p)
+ return false;
+
+ if (!cpumask_and(cpus, nohz.idle_cpus_mask, housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)))
+ goto requeue;

So I think the main goal here should be to get rid of the whole single
nohz balancing thing.

This global state/mask has been shown to be a problem over and over again.

Ideally we keep a nohz idle mask per LLC (right next to the overload
mask you introduced earlier), along with a bit in the sched_domain tree
upwards of that to indicate a particular llc/ node / distance-group has
nohz idle.

Then if the topmost domain has the bit set it means there are nohz cpus
to be found, and we can (slowly) iterate the domain tree up from
overloaded LLC to push tasks around.

I'll to through fair.c to understand all the usecases of
"nohz.idle_cpus_mask" and then start with this bit for v2 to see if that
blows up in some way. I'll be back shortly.


Anyway, yes, you gotta start somewhere :-)

Thanks a ton for the initial review. I'll go analyze more to see what
bits are making benchmarks go sad.

--
Thanks and Regards,
Prateek