Re: RFC: documentation of the autogroup feature [v2]

From: Peter Zijlstra
Date: Fri Nov 25 2016 - 11:06:07 EST


On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote:
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âFIXME â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >> âHow do the nice value of a process and the nice â
> >> âvalue of an autogroup interact? Which has priority? â
> >> â â
> >> âIt *appears* that the autogroup nice value is used â
> >> âfor CPU distribution between task groups, and that â
> >> âthe process nice value has no effect there. (I.e., â
> >> âsuppose two autogroups each contain a CPU-bound â
> >> âprocess, with one process having nice==0 and the â
> >> âother having nice==19. It appears that they each â
> >> âget 50% of the CPU.) It appears that the process â
> >> ânice value has effect only with respect to schedulâ â
> >> âing relative to other processes in the *same* autoâ â
> >> âgroup. Is this correct? â
> >> âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
> >
> > Yup, entity nice level affects distribution among peer entities.
>
> Huh! I only just learned about this via my experiments while
> investigating autogroups.
>
> How long have things been like this? Always? (I don't think
> so.) Since the arrival of CFS? Since the arrival of
> autogrouping? (I'm guessing not.) Since some other point?
> (When?)

Ever since cfs-cgroup, this is a fundamental design point of cgroups,
and has therefore always been the case for autogroups (as that is
nothing more than an application of the cgroup code).

> It seems to me that this renders the traditional process
> nice pretty much useless. (I bet I'm not the only one who'd
> be surprised by the current behavior.)

Its really rather fundamental to how the whole hierarchical things
works.

CFS is a weighted fair queueing scheduler; this means each entity
receives:

w_i
dt_i = dt --------
\Sum w_j


CPU
______/ \______
/ | | \
A B C D


So if each entity {A,B,C,D} has equal weight, then they will receive
equal time. Explicitly, for C you get:


w_C
dt_C = dt -----------------------
(w_A + w_B + w_C + w_D)


Extending this to a hierarchy, we get:


CPU
______/ \______
/ | | \
A B C D
/ \
E F

Where C becomes a 'server' for entities {E,F}. The weight of C does not
depend on its child entities. This way the time of {E,F} becomes a
straight product of their ratio with C. That is; the whole thing
becomes, where l denotes the level in the hierarchy and i an
entity on that level:

l w_g,i
dt_l,i = dt \Prod ----------
g=0 \Sum w_g,j


Or more concretely, for E:

w_E
dt_1,E = dt_0,C -----------
(w_E + w_F)

w_C w_E
= dt ----------------------- -----------
(w_A + w_B + w_C + w_D) (w_E + w_F)


And this 'trivially' extends to SMP, with the tricky bit being that the
sums over all entities end up being machine wide, instead of per CPU,
which is a real and royal pain for performance.


Note that this property, where the weight of the server entity is
independent from its child entities is a desired feature. Without that
it would be impossible to control the relative weights of groups, and
that is the sole parameter of the WFQ model.

It is also why Linus so likes autogroups, each session competes equally
amongst one another.