Re: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers

From: Mel Gorman
Date: Tue Oct 26 2021 - 04:18:23 EST


On Mon, Oct 25, 2021 at 08:35:52AM +0200, Mike Galbraith wrote:
> On Fri, 2021-10-22 at 12:05 +0100, Mel Gorman wrote:
> > On Fri, Oct 22, 2021 at 12:26:08PM +0200, Mike Galbraith wrote:
> >
> > >
> > > Patchlet helped hackbench?  That's.. unexpected (at least by me).
> > >
> >
> > I didn't analyse in depth and other machines do not show as dramatic
> > a difference but it's likely due to timings of tasks getting wakeup
> > preempted.
>
> Wakeup tracing made those hackbench numbers less surprising. There's
> tons of wake-many going on. At a glance, it appears to already be bi-
> directional though, so patchlet helping seemingly means that there's
> just not quite enough to tickle the heuristic without a little help.

Another possible explanation is that hackbench overloads a machine to
such an extent that the ratio of bi-directional wakeups is not
sufficient to trigger the wake-wide logic.

> Question is, is the potential reward of strengthening that heuristic
> yet again, keeping in mind that "heuristic" tends to not play well with
> "deterministic", worth the risk?
>
> My desktop trace session said distribution improved a bit, but there
> was no meaningful latency or throughput improvement, making for a
> pretty clear "nope" to the above question.

Another interpretation is that it's simply neutral and does no harm.

> It benefiting NUMA box
> hackbench is a valid indicator, but one that is IMO too disconnected
> from the real world to carry much weight.
>

I think if it's not shown to be harmful to a realistic workload but helps
an overloaded example then it should be ok. While excessive overload is
rare in a realistic workload, it does happen. There are a few workloads
I've seen bugs for that were triggered when an excessive number of worker
threads get spawned and compete for CPU access which in turns leads more
worker threads get spawned. There are application workarounds for this
corner case but it still triggers bugs.

--
Mel Gorman
SUSE Labs