Re: [QUERY] Confusing usage of rq->nr_running in load balancing

From: Vincent Guittot
Date: Wed Sep 03 2014 - 12:58:40 EST


On 3 September 2014 14:21, Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx> wrote:
> Hi,

Hi Preeti,

>
> There are places in kernel/sched/fair.c in the load balancing part where
> rq->nr_running is used as against cfs_rq->nr_running. At least I could
> not make out why the former was used in the following scenarios.
> It looks to me that it can very well lead to incorrect load balancing.
> Also I did not pay attention to the numa balancing part of the code
> while skimming through this file to catch this scenario. There are a
> couple of places there too which need to be scrutinized.
>
> 1. load_balance(): The check (busiest->nr_running > 1)
> The load balancing would be futile if there are tasks of other
> scheduling classes, wouldn't it?

agree with you

>
> 2. active_load_balance_cpu_stop(): A similar check and a similar
> consequence as 1 here.

agree with you

>
> 3. nohz_kick_needed() : We check for more than one task on the runqueue
> and hence trigger load balancing even if there are rt-tasks.

I can see one potentiel reason why rq->nr_running is interesting that
is the group capacity might have changed because of non cfs tasks
since last load balance. So we need to monitor the change of the
groups' capacity to ensure that the average load of each group is
still in the same level

>
> 4. cpu_avg_load_per_task(): This stands out among the rest as an
> incorrect usage of rq->nr_running in place of cfs_rq->nr_running. We
> divide the load associated with the cfs_rq by the number of tasks on the
> rq. This will make the cfs_rq load look smaller.

This one is solved in the consolidation of cpu_capacity patchset

>
> 5. task_hot() : I am not too sure about the consequences of using
> rq->nr_running here.

IIUC, the goal is to check if other tasks are running on the dst_rq so
rq->nr_running is the good one

>
> 6. update_sg_lb_stats(): sgs->sum_nr_running is the sum of
> rq->nr_running and propogates thus throughout the load balancing code path.

This one is solved in the consolidation of cpu_capacity patchset
>
> 7. sg_capacity_factor(): Returns the capacity factor measured against
> the cpu capacity available to fair tasks. We then compare this with the
> rq->nr_running in update_sg_lb_stats(), update_sd_pick_busiest() and
> calculate_imbalance()

This one is removed with the consolidation of cpu_capacity patchset

>
> 8. find_busiest_queue(): This anomaly shows up when we filter against
> rq->nr_running == 1 and imbalance cannot be taken care of by the
> existing task on this rq.

agree with you even if the test with wl should prevent wrong decision
as a wl will be null if no cfs task are present

Regards
Vincent
>
> Did I miss something or is it true that the usage of rq->nr_running in
> the above places is incorrect?
>
> Thanks
>
> Regards
> Preeti U Murthy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/