Re: [RFC] The Linux Scheduler: a Decade of Wasted Cores Report

From: Rik van Riel
Date: Mon Apr 25 2016 - 13:54:18 EST


On Mon, 2016-04-25 at 11:34 +0200, Peter Zijlstra wrote:
> On Sat, Apr 23, 2016 at 06:38:25PM -0700, Brendan Gregg wrote:
>
> > Their proof of concept patches are online[1]. I tested them and saw
> > 0%
> > improvements on the systems I tested, for some simple workloads[2].
> > I
> > tested 1 and 2 node NUMA, as that is typical for my employer
> > (Netflix,
> > and our tens of thousands of Linux instances in the AWS/EC2 cloud),
> > even though I wasn't expecting any difference on 1 node. I've used
> > synthetic workloads so far.
> So their setup uses a bigger (not fully connected) NUMA topology, and
> I'm not entirely sure how much of their problems are due to that, but
> at
> least one of them is.
>
> Such boxes are fairly rare.

Their proposed fix, of making sure we build all 8 sched
groups with 5 nodes each in them seems a little bit
roundabout when compared with a simpler alternative,
though.

When dealing with a NUMA_GLUELESS_MESH topology, we
should simply not build any sched domains with multiple
nodes inside them, except for the top level domain that
contains all the nodes.

At that point, we will balance between threads, inside
each core, and between all nodes, without running into
those pointless (and potentially harmful) intermediate
sched domains.

--
All Rights Reversed.

Attachment: signature.asc
Description: This is a digitally signed message part