Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
From: Vincent Guittot
Date: Fri May 13 2016 - 04:22:54 EST
On 12 May 2016 at 23:48, Yuyang Du <yuyang.du@xxxxxxxxx> wrote:
> On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen wrote:
>> Commit-ID: cfa10334318d8212d007da8c771187643c9cef35
>> Gitweb: http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35
>> Author: Morten Rasmussen <morten.rasmussen@xxxxxxx>
>> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100
>> Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> CommitDate: Thu, 12 May 2016 09:55:33 +0200
>>
>> sched/fair: Correct unit of load_above_capacity
>>
>> In calculate_imbalance() load_above_capacity currently has the unit
>> [capacity] while it is used as being [load/capacity]. Not only is it
>> wrong it also makes it unlikely that load_above_capacity is ever used
>> as the subsequent code picks the smaller of load_above_capacity and
>> the avg_load
>>
>> This patch ensures that load_above_capacity has the right unit
>> [load/capacity].
load_above_capacity has the unit [load] and is then compared with other load
>>
>> Signed-off-by: Morten Rasmussen <morten.rasmussen@xxxxxxx>
>> [ Changed changelog to note it was in capacity unit; +rebase. ]
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
>> Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Mike Galbraith <efault@xxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@xxxxxxx
>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>> ---
>> kernel/sched/fair.c | 6 ++++--
>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2338105..218f8e8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>> if (busiest->group_type == group_overloaded &&
>> local->group_type == group_overloaded) {
>> load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE;
>> - if (load_above_capacity > busiest->group_capacity)
>> + if (load_above_capacity > busiest->group_capacity) {
>> load_above_capacity -= busiest->group_capacity;
>> - else
>> + load_above_capacity *= NICE_0_LOAD;
>> + load_above_capacity /= busiest->group_capacity;
If I follow correctly the sequence,
load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE
- busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity
so
load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE /
busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD
so the unit is [load]
Lets take the example of a group made of 2 cores with 2 threads per
core and the capacity of each core is 1,5* SCHED_CAPACITY_SCALE so the
capacity of the group is 3*SCHED_CAPACITY_SCALE.
If we have 6 tasks on this group :
load_above capacity = 1 *NICE_0_LOAD which means we want to remove no
more than 1 tasks from the group and let 5 tasks in the group whereas
we don't expect to have more than 4 tasks as we have 4 CPUs and
probably less because the compute capacity of each CPU is less than
the default one
So i would have expected
load_above_capacity = busiest->sum_nr_running * NICE_0_LOAD -
busiest->group_capacity * NICE_0_LOAD / SCHED_CAPACITY_SCALE
load_above capacity = 3*NICE_0_LOAD which means we want to remove no
more than 3 tasks and let 3 tasks in the group
Regards,
Vincent
>> + } else
>> load_above_capacity = ~0UL;
>> }
>
> Hi Morten,
>
> I got the feeling this might be wrong, the NICE_0_LOAD should be scaled down.
> But I hope I am wrong.
>
> Vincent, could you take a look?