Re: [PATCH 00/10] sched/fair: wake_affine improvements

From: Mel Gorman
Date: Mon Apr 26 2021 - 07:41:19 EST


On Mon, Apr 26, 2021 at 04:09:40PM +0530, Srikar Dronamraju wrote:
> * Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> [2021-04-23 09:25:32]:
>
> > On Thu, Apr 22, 2021 at 03:53:16PM +0530, Srikar Dronamraju wrote:
> > > Recently we found that some of the benchmark numbers on Power10 were lesser
> > > than expected. Some analysis showed that the problem lies in the fact that
> > > L2-Cache on Power10 is at core level i.e only 4 threads share the L2-cache.
> > >
> >
> > I didn't get the chance to review this properly although I am suspicious
> > of tracking idle_core and updating that more frequently. It becomes a very
> > hot cache line that bounces. I did experiement with tracking an idle core
> > but the data either went stale too quickly or the updates incurred more
> > overhead than a reduced search saved.
> >
>
> This change does increase the number of times we read the idle-core. There
> are also more places where we try to update the idle-core. However I feel
> the number of times, we actually update the idle-core now will be much
> lesser than previous, because we are mostly doing a conditional update. i.e
> we are updating the idle-core only if the waking up CPU happens to be part
> of our core.
>

Increased cache misses may be detectable from perf.

> Also if the system is mostly lightly loaded, we check for
> available_idle_cpu, so we may not look for an idle-core. If the system is
> running a CPU intensive task, then the idle-core will most likely to be -1.
> Its only the cases where the system utilization keeps swinging between
> lightly loaded to heavy load, that we would end up checking and setting
> idle-core.
>

But this is a "how long is a piece of string" question because the benefit
of tracking an idle core depends on both the interarrival time of wakeups,
the domain utilisation and the length of time tasks are running. When
I was looking at the area, I tracked the SIS efficiency to see how much
each change was helping. The patch no longer applies but the stats are
understood by mmtests if you wanted to forward port it. It's possible
you would do something similar but specific to idle_core -- e.g. track
how often it's updated, how often it's read, how often a CPU is returned
and how often it's still an idle core and use those stats to calculate
hit/miss ratios.

However, I would caution against conflating the "fallback search domain"
with the patches tracking idle core because they should be independent
of each other.

Old patch that no longer applies that was the basis for some SIS work
over a year ago is below

---8<---