Re: [PATCH 4/4] sched/numa: Do not move imbalanced load purely on the basis of an idle CPU

From: Srikar Dronamraju
Date: Fri Sep 07 2018 - 09:42:13 EST


* Peter Zijlstra <peterz@xxxxxxxxxxxxx> [2018-09-07 14:44:32]:

> On Fri, Sep 07, 2018 at 01:37:39PM +0100, Mel Gorman wrote:
> > On Fri, Sep 07, 2018 at 01:33:09PM +0200, Peter Zijlstra wrote:
> > > > ---
> > > > kernel/sched/fair.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index d59d3e00a480..d4c289c11012 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -1560,7 +1560,7 @@ static bool task_numa_compare(struct task_numa_env *env,
> > > > goto unlock;
> > > >
> > > > if (!cur) {
> > > > - if (maymove || imp > env->best_imp)
> > > > + if (maymove)
> > > > goto assign;
> > > > else
> > > > goto unlock;
> > >
> > > Srikar's patch here:
> > >
> > > http://lkml.kernel.org/r/1533276841-16341-4-git-send-email-srikar@xxxxxxxxxxxxxxxxxx
> > >
> > > Also frobs this condition, but in a less radical way. Does that yield
> > > similar results?
> >
> > I can check. I do wonder of course if the less radical approach just means
> > that automatic NUMA balancing and the load balancer simply disagree about
> > placement at a different time. It'll take a few days to have an answer as
> > the battery of workloads to check this take ages.
>
> Yeah, I was afraid it would.. Srikar, can you also evaluate, I suspect
> we'll have to pick one of these two patches.

I can surely run some benchmarks between the two patches.
However comparing Mel's patch with
http://lkml.kernel.org/r/1533276841-16341-4-git-send-email-srikar@xxxxxxxxxxxxxxxxxx

Mel's patch

if (!cur) {
- if (maymove || imp > env->best_imp)
+ if (maymove)
goto assign;
else
http://lkml.kernel.org/r/1533276841-16341-4-git-send-email-srikar@xxxxxxxxxxxxxxxxxx


if (!cur) {
- if (maymove || imp > env->best_imp)
+ if (maymove && moveimp >= env->best_imp)
goto assign;
else

In Mel's fix, if we already found a candidate task to swap and then encounter a
idle cpu, we are going ahead and overwriting the swap candidate. There is
always a chance that swap candidate is a better fit than moving to idle cpu.

In the patch which is in your queue, we are saying move only if it is better than
swap candidate. So this is noway less radical than Mel's patch and probably
more correct.

--
Thanks and Regards
Srikar Dronamraju
>