Re: [PATCH v2] sched/fair: fix imbalance due to CPU affinity
From: Vincent Guittot
Date: Fri Jul 05 2019 - 08:23:33 EST
On Tue, 2 Jul 2019 at 16:29, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
>
>
> On 02/07/2019 11:00, Vincent Guittot wrote:
> >> Does that want a
> >>
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Fixes: afdeee0510db ("sched: Fix imbalance flag reset")
> >
> > I was not sure that this has been introduced by this patch or
> > following changes. I haven't been able to test it on such old kernel
> > with my platform
> >
>
> Right, seems like
>
> 65a4433aebe3 ("sched/fair: Fix load_balance() affinity redo path")
>
> also played in this area. From surface level it looks like it only reduced
> the amount of CPUs the load_balance() redo can use (and interestingly it
> mentions the exact same bug as you observed, through triggered slightly
> differently).
>
> I'd be inclined to say that the issue was introduced by afdeee0510db, since
> from looking at the code from that time I can see the issue happening:
I agree that the patch seems to be the root cause when reading code.
But it also means that the bug is there for almost 5 years and has
never been seen before I did some functional tests on my rework of the
load balance
That's why a real test would have confirmed that nothing else happens
in the meantime
>
> - try to pull from a CPU with only tasks pinned to itself
> - set sgc->imbalance
> - redo with a CPU that sees no big imbalance
> - goto out_balanced
> - env.LBF_ALL_PINNED is still set but we clear sgc->imbalance
>
> >>
> >> ?
> >>