Re: [RFC][PATCH 1/5] sched/fair: Fix select_idle_cpu()s cost accounting

From: Mel Gorman
Date: Sat Jan 09 2021 - 09:13:19 EST


On Fri, Jan 08, 2021 at 08:45:44PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 08, 2021 at 04:10:51PM +0100, Vincent Guittot wrote:
> > Also, there is another problem (that I'm investigating) which is that
> > this_rq()->avg_idle is stalled when your cpu is busy. Which means that
> > this avg_idle can just be a very old and meaningless value. I think
> > that we should decay it periodically to reflect there is less and less
>
> https://lkml.kernel.org/r/20180530143105.977759909@xxxxxxxxxxxxx
>
> :-)

This needs to be revived. I'm of the opinion that your initial series
needs to be split somewhat into "single scan for core" parts followed by
the "Fix depth selection". I have tests in the queue and one of them is
just patch 2 on its own. Preliminary results for patch 2 on its own do not
look bad but that's based on one test (tbench). It'll be tomorrow before
it works through variants of patch 1 which I suspect will be inconclusive
and make me more convinced it should be split out separately.

The single scan for core would be patches 2-4 of the series this thread
is about which is an orthogonal problem to avoiding repeated scans of
the same runqueues during a single wakeup. I would prefer to drop patch
5 initially because the has_idle_cores throttling mechanism for idle core
searches is reasonably effective and the first round of tests indicated
that changing it was inconclusive and should be treated separately.

The depth scan stuff would go on top because right now the depth scan is
based on magic numbers and no amount of shuffling that around will make
it less magic without an avg_idle value that decays and potentially the
scan cost also aging. It's much less straight-forward than the single
scan aspect.

Switching idle core searches to SIS_PROP should also be treated
separately.

Thoughts? I don't want to go too far in this direction on my own because
the testing requirements are severe in terms of time. Even then no matter
how much I test, stuff will be missed as it's sensitive to domain sizes,
CPU generation, cache topology, cache characteristics, domain utilisation,
architecture, kitchen sink etc.

--
Mel Gorman
SUSE Labs