Re: [PATCH V5 1/2] sched/fair: Fix load_balance() affinity redo path

From: Peter Zijlstra
Date: Wed Jul 05 2017 - 07:22:44 EST


On Wed, Jun 07, 2017 at 01:18:57PM -0600, Jeffrey Hugo wrote:
> If load_balance() fails to migrate any tasks because all tasks were
> affined, load_balance() removes the source cpu from consideration and
> attempts to redo and balance among the new subset of cpus.
>
> There is a bug in this code path where the algorithm considers all active
> cpus in the system (minus the source that was just masked out). This is
> not valid for two reasons: some active cpus may not be in the current
> scheduling domain and one of the active cpus is dst_cpu. These cpus should
> not be considered, as we cannot pull load from them.
>
> Instead of failing out of load_balance(), we may end up redoing the search
> with no valid cpus and incorrectly concluding the domain is balanced.
> Additionally, if the group_imbalance flag was just set, it may also be
> incorrectly unset, thus the flag will not be seen by other cpus in future
> load_balance() runs as that algorithm intends.
>
> Fix the check by removing cpus not in the current domain and the dst_cpu
> from considertation, thus limiting the evaluation to valid remaining cpus
> from which load might be migrated.
>
> Co-authored-by: Austin Christ <austinwc@xxxxxxxxxxxxxx>
> Co-authored-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> Signed-off-by: Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx>
> Tested-by: Tyler Baicar <tbaicar@xxxxxxxxxxxxxx>

Yes, this looks good. Thanks!