Re: [PATCHSET for-4.11] cgroup: implement cgroup v2 thread mode

From: Mike Galbraith
Date: Tue Mar 14 2017 - 10:51:05 EST


On Mon, 2017-03-13 at 15:26 -0400, Tejun Heo wrote:
> Hello, Mike.
>
> Sorry about the long delay.
>
> On Mon, Feb 13, 2017 at 06:45:07AM +0100, Mike Galbraith wrote:
> > > > So, as long as the depth stays reasonable (single digit or lower),
> > > > what we try to do is keeping tree traversal operations aggregated or
> > > > located on slow paths. There still are places that this overhead
> > > > shows up (e.g. the block controllers aren't too optimized) but it
> > > > isn't particularly difficult to make a handful of layers not matter at
> > > > all.
> > >
> > > A handful of cpu bean counting layers stings considerably.
>
> Hmm... yeah, I was trying to think about ways to avoid full scheduling
> overhead at each layer (the scheduler does a lot per each layer of
> scheduling) but don't think it's possible to circumvent that without
> introducing a whole lot of scheduling artifacts.

Yup.

> In a lot of workloads, the added overhead from several layers of CPU
> controllers doesn't seem to get in the way too much (most threads do
> something other than scheduling after all).

Sure, don't schedule a lot, it doesn't hurt much, but there are plenty
of loads that routinely do schedule a LOT, and there it matters a LOT..
which is why network benchmarks tend to be severely allergic to
scheduler lard.

> The only major issue that
> we're seeing in the fleet is the cgroup iteration in idle rebalancing
> code pushing up the scheduling latency too much but that's a different
> issue.

Hm, I would suspect PELT to be the culprit there. It helps smooth out
load balancing, but will stack "skinny looking" tasks.

-Mike