Re: [PATCH 4/4] sched/fair: Use a recently used CPU as an idle candidate and the basis for SIS

From: Peter Zijlstra
Date: Tue Jan 30 2018 - 06:51:15 EST


On Tue, Jan 30, 2018 at 10:45:55AM +0000, Mel Gorman wrote:
> The select_idle_sibling (SIS) rewrite in commit 10e2f1acd010 ("sched/core:
> Rewrite and improve select_idle_siblings()") replaced a domain iteration
> with a search that broadly speaking does a wrapped walk of the scheduler
> domain sharing a last-level-cache. While this had a number of improvements,
> one consequence is that two tasks that share a waker/wakee relationship push
> each other around a socket. Even though two tasks may be active, all cores
> are evenly used. This is great from a search perspective and spreads a load
> across individual cores but it has adverse consequences for cpufreq. As each
> CPU has relatively low utilisation, cpufreq may decide the utilisation is
> too low to used a higher P-state and overall computation throughput suffers.

> While individual cpufreq and cpuidle drivers may compensate by artifically
> boosting P-state (at c0) or avoiding lower C-states (during idle), it does
> not help if hardware-based cpufreq (e.g. HWP) is used.

Not saying this patch is bad; but Rafael / Srinivas we really should do
better. Why isn't cpufreq (esp. sugov) fixing this? HWP or not, we can
still give it hints, and it looks like we're not doing that.

Mel, what hardware are you testing this on?