Re: sched: tweak select_idle_sibling to look for idle threads

From: Peter Zijlstra
Date: Tue May 03 2016 - 10:32:34 EST

On Mon, May 02, 2016 at 11:47:25AM -0400, Chris Mason wrote:
> On Mon, May 02, 2016 at 04:58:17PM +0200, Peter Zijlstra wrote:
> > On Mon, May 02, 2016 at 04:50:04PM +0200, Mike Galbraith wrote:
> > > Oh btw, did you know single socket boxen have no sd_busy? That doesn't
> > > look right.
> >
> > I suspected; didn't bother looking at yet. The 'problem' is that the LLC
> > domain is the top-most, so it doesn't have a parent domain. I'm sure we
> > can come up with something if we can get this all working right.
> >
> > And yes, I can get gains on various workloads with various options, I
> > can even break all workloads, but I've so far completely failed on
> > getting a win for everyone :/
> Adding in the task_hot() check to decide if scanning idle was a good
> idea ended up being really important

So I'm conflicted on this patch:

+static int bounce_to_target(struct task_struct *p, int cpu)
+ s64 delta;
+ /*
+ * as the run queue gets bigger, its more and more likely that
+ * balance will have distributed things for us, and less likely
+ * that scanning all our CPUs for an idle one will find one.
+ * So, if nr_running > 1, just call this CPU good enough
+ */
+ if (cpu_rq(cpu)->cfs.nr_running > 1)
+ return 1;
+ /* taken from task_hot() */
+ delta = rq_clock_task(task_rq(p)) - p->se.exec_start;
+ return delta < (s64)sysctl_sched_migration_cost;

This will work for you schbench workload because it sleep for 30ms while
the migration_cost thingy is 500us, therefore you'll trigger the full
LLC scan.

_However_, the migration_cost is supposed the model the cost of leaving
the LLC, so testing against that here seems wrong.

Let me go play with something that measures the cost of doing that LLC
scan and compares that against the sleepy time -- of course, now need to
go figure out how to do this clock thing without rq-lock pain.

+ if (package_sd && !bounce_to_target(p, target)) {
+ for_each_cpu_and(i, sched_domain_span(package_sd), tsk_cpus_allowed(p)) {
+ if (idle_cpu(i)) {
+ target = i;
+ break;
+ }
+ }
+ }

Also note your s/sd/package_sd/ rename is, strictly speaking, wrong.
Sure, on your current Intel system the LLC is the entire package, but
this is not true in general.

Take for instance the Intel Core2Quad and AMD Bulldozer thingies, they
had two dies in one package, and correspondingly two LLC domains in one

(also, the Intel cluster-on-die thing can split the thing in two)

There were also the old P6 era SMP boards which had external LLC, where
you could have an LLC shared across multiple packages -- although I'm
thinking we'll never see that again, due to off package being far
toooooo slooooooow these days.