Re: [PATCH 06/13] sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity

From: Mel Gorman
Date: Tue Feb 18 2020 - 03:47:57 EST


On Tue, Feb 18, 2020 at 11:32:44AM +0800, Hillf Danton wrote:
>
> On Mon, 17 Feb 2020 15:06:15 +0000 Mel Gorman wrote:
> > On Mon, Feb 17, 2020 at 09:20:19PM +0800, Hillf Danton wrote:
> > > > /*
> > > > - * If the improvement from just moving env->p direction is better
> > > > - * than swapping tasks around, check if a move is possible.
> > > > + * If dst node has spare capacity, then check if there is an
> > > > + * imbalance that would be overruled by the load balancer.
> > > > */
> > > > - maymove = !load_too_imbalanced(src_load, dst_load, env);
> > > > + if (env->dst_stats.node_type == node_has_spare) {
> > >
> > > so maymove should be true here.
> > >
> >
> > Performance suffers on numerous workloads that way.
> >
> Suspect you are meaning something like
>
> maymove = adjust_numa_imbalance(true,
> env->src_stats.nr_running);
>
> given dst node's spare capacity.
>

Given that adjust_numa_imbalance takes the imbalance as the first
parameter, not a boolean and it's not unconditionally true, I don't
get what you mean. Can you propose a patch on top of the entire series
explaining what you suggest please?

> > > > + unsigned int imbalance;
> > > > + int src_running, dst_running;
> > > > +
> > > > + /* Would movement cause an imbalance? */
> > > > + src_running = env->src_stats.nr_running - 1;
> > > > + dst_running = env->dst_stats.nr_running + 1;
> > > > + imbalance = max(0, dst_running - src_running);
> > > > + imbalance = adjust_numa_imbalance(imbalance, src_running);
> > > > +
> > > The imbalance could be ignored if src domain is idle enough, and no move
> > > could be expected.
> > >
> >
> > Again, it hits corner cases. While there is scope for allowing some
> > degree of imbalance, it needs to be a separate patch on top of this.
> > It's something I intend to examine but only once this series is out of
> > the way because the NUMA and load balancer do need to be using similar
> > logic first or it gets a bit fragile.
> >
> Add this to log or comment.
>

It's somewhat codified by the general comment "Would movement cause an
imbalance?". The hint is that any change in the NUMA balancer should check
whether it is in conflict with the load balancer. Being specific runs the
risk that the comment gets stale which may be dangerously misleading if
it's later wrong and a developer trusts the comment instead of checking the
code. Generally I do try to explain fully what is happening in comments but
this is a case where I really want developers to check the load balancer
code if they are updating the NUMA balancer.

> > With this patch, and the series in general, it does mean that some tasks
> > fail to migrate to a CPU local to the memory being accessed even though
> > there are CPUs available but having the NUMA balancer and load balancer
> > override each other is not free either.
> >
>
> Ditto
>

I can update the changelog if there is a v4 release, right now there are
no substantial changes suggested by review. That may change depending on
other review feedback and what you think the call to adjust_numa_imbalance
should look like. Also bear in mind that the changelog wil become stale
if/when adjust_numa_imbalance is altered to allow a small imbalance between
NUMA nodes. While I plan to do that, this series should be finalised first.

--
Mel Gorman
SUSE Labs