Re: [PATCH] sched, fair: Allow a small load imbalance between low utilisation SD_NUMA domains v4

From: Vincent Guittot
Date: Fri Jan 17 2020 - 08:08:28 EST


Hi Mel,


On Thu, 16 Jan 2020 at 17:35, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 14, 2020 at 10:13:20AM +0000, Mel Gorman wrote:
> > Changelog since V3
> > o Allow a fixed imbalance a basic comparison with 2 tasks. This turned out to
> > be as good or better than allowing an imbalance based on the group weight
> > without worrying about potential spillover of the lower scheduler domains.
> >
> > Changelog since V2
> > o Only allow a small imbalance when utilisation is low to address reports that
> > higher utilisation workloads were hitting corner cases.
> >
> > Changelog since V1
> > o Alter code flow vincent.guittot
> > o Use idle CPUs for comparison instead of sum_nr_running vincent.guittot
> > o Note that the division is still in place. Without it and taking
> > imbalance_adj into account before the cutoff, two NUMA domains
> > do not converage as being equally balanced when the number of
> > busy tasks equals the size of one domain (50% of the sum).
> >
> > The CPU load balancer balances between different domains to spread load
> > and strives to have equal balance everywhere. Communicating tasks can
> > migrate so they are topologically close to each other but these decisions
> > are independent. On a lightly loaded NUMA machine, two communicating tasks
> > pulled together at wakeup time can be pushed apart by the load balancer.
> > In isolation, the load balancer decision is fine but it ignores the tasks
> > data locality and the wakeup/LB paths continually conflict. NUMA balancing
> > is also a factor but it also simply conflicts with the load balancer.
> >
> > This patch allows a fixed degree of imbalance of two tasks to exist
> > between NUMA domains regardless of utilisation levels. In many cases,
> > this prevents communicating tasks being pulled apart. It was evaluated
> > whether the imbalance should be scaled to the domain size. However, no
> > additional benefit was measured across a range of workloads and machines
> > and scaling adds the risk that lower domains have to be rebalanced. While
> > this could change again in the future, such a change should specify the
> > use case and benefit.
> >
>
> Any thoughts on whether this is ok for tip or are there suggestions on
> an alternative approach?

I have just finished to run some tests on my system with your patch
and I haven't seen any noticeable any changes so far which was a bit
expected. The tests that I usually run, use more than 4 tasks on my 2
nodes system; the only exception is perf sched pipe and the results
for this test stays the same with and without your patch. I'm curious
if this impacts Phil's tests which run LU.c benchmark with some
burning cpu tasks

>
> --
> Mel Gorman
> SUSE Labs