Re: [PATCH V2] sched: Improve load balancing in the presence of idle CPUs

From: Preeti U Murthy
Date: Fri Mar 27 2015 - 12:46:55 EST


Hi Morten,

On 03/27/2015 08:08 PM, Morten Rasmussen wrote:
> Hi Preeti,
>
> On Thu, Mar 26, 2015 at 01:02:44PM +0000, Preeti U Murthy wrote:
>> Fix this, by checking if a CPU was woken up to do nohz idle load
>> balancing, before it does load balancing upon itself. This way we allow
>> idle CPUs across the system to do load balancing which results in
>> quicker spread of load, instead of performing load balancing within the
>> local sched domain hierarchy of the ILB CPU alone under circumstances
>> such as above.
>>
>> Signed-off-by: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>
>> ---
>> Changes from V1:
>> 1. Added relevant comments
>> 2. Wrapped lines to a fixed width in the changelog
>>
>> kernel/sched/fair.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index bcfe320..8b6d0d5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7660,14 +7660,16 @@ static void run_rebalance_domains(struct softirq_action *h)
>> enum cpu_idle_type idle = this_rq->idle_balance ?
>> CPU_IDLE : CPU_NOT_IDLE;
>>
>> - rebalance_domains(this_rq, idle);
>> -
>> /*
>> * If this cpu has a pending nohz_balance_kick, then do the
>> * balancing on behalf of the other idle cpus whose ticks are
>> - * stopped.
>> + * stopped. Do nohz_idle_balance *before* rebalance_domains to
>> + * give the idle cpus a chance to load balance. Else we may
>> + * load balance only within the local sched_domain hierarchy
>> + * and abort nohz_idle_balance altogether if we pull some load.
>> */
>> nohz_idle_balance(this_rq, idle);
>> + rebalance_domains(this_rq, idle);
>
> IIUC, this change means that you will always wake up one more cpu than
> necessary unless you have enough work for all cpus in the system. For
> example, cpu0 is busy with two tasks and cpu1+2 are nohz_idle. cpu0
> kicks cpu1 to do a nohz_idle_balance(). With the change it will balance
> on behalf of cpu2 first and pull one of the tasks from cpu0. When done
> with nohz_idle_balance() cpu1 has nothing left to pull when balancing
> itself and goes back to sleep.
>
> My concern is that this will increase the number of cpu wakeups quite
> significantly. Am I missing something?

Its true that we wakeup all idle CPUs. But I think we are justified in
doing so, given that nohz_idle_balance() was deemed necessary. The logic
behind nohz_idle_balance() as I see it is that, all idle CPUs should be
brought to action when some scheduling group is found to be busy. With
the current code that does not happen if one of them happen to pull
tasks. This logic does not make sense to me.

With the current code, I think it is hard to estimate how many idle CPU
wakeups would be sufficient to balance out the system load. But I
certainly feel that waking up all of them to perform the load balancing
that was asked for, is far better than waking up none of them. This is
in favor of performance. I agree the extra wakeups would do us harm with
power savings, but I would still be fine with it, given the undesirable
scenario that occurs as a consequence, as described in the changelog.

Regards
Preeti U Murthy
>
> Morten
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/