Re: [PATCH v9 12/16] sched/core: uclamp: Extend CPU's cgroup controller

From: Patrick Bellasi
Date: Mon Jun 03 2019 - 08:31:20 EST


On 31-May 08:35, Tejun Heo wrote:

[...]

> > These attributes:
> >
> > a) are available only for non-root nodes, both on default and legacy
> > hierarchies, while system wide clamps are defined by a generic
> > interface which does not depends on cgroups. This system wide
> > interface enforces constraints on tasks in the root node.
>
> I'd much prefer if they weren't entangled this way. The system wide
> limits should work the same regardless of cgroup's existence. cgroup
> can put further restriction on top but mere creation of cgroups with
> cpu controller enabled shouldn't take them out of the system-wide
> limits.

That's correct and what you describe matches, at least on its
intents, the current implementation provided in:

[PATCH v9 14/16] sched/core: uclamp: Propagate system defaults to root group
https://lore.kernel.org/lkml/20190515094459.10317-15-patrick.bellasi@xxxxxxx/

System clamps always work the same way, independently from cgroups:
they define the upper bound for both min and max clamps.

When cgroups are not available, tasks specific clamps are always
capped by system clamps.

When cgroups are available, the root task group clamps are capped by
the system clamps, which affects its "effective" clamps and propagate
them down the hierarchy to child's "effective" clamps.
That's done in:

[PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps
https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@xxxxxxx/


Example 1
---------

Here is an example of system and groups clamps aggregation:

min max
system defaults 400 600

cg_name min min.effective max max.effective
/uclamp 1024 400 500 500
/uclamp/app 512 400 512 500
/uclamp/app/pid_smalls 100 100 200 200
/uclamp/app/pid_bigs 500 400 700 500


The ".effective" clamps are used to define the actual clamp value to
apply to tasks, according to the aggregation rules defined in:

[PATCH v9 15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps
https://lore.kernel.org/lkml/20190515094459.10317-16-patrick.bellasi@xxxxxxx/

All the above, to me it means that:
- cgroups are always capped by system clamps
- cgroups can further restrict system clamps

Does that match with your view?

> > b) enforce effective constraints at each level of the hierarchy which
> > are a restriction of the group requests considering its parent's
> > effective constraints. Root group effective constraints are defined
> > by the system wide interface.
> > This mechanism allows each (non-root) level of the hierarchy to:
> > - request whatever clamp values it would like to get
> > - effectively get only up to the maximum amount allowed by its parent
>
> I'll come back to this later.
>
> > c) have higher priority than task-specific clamps, defined via
> > sched_setattr(), thus allowing to control and restrict task requests
>
> This sounds good.
>
> > Add two new attributes to the cpu controller to collect "requested"
> > clamp values. Allow that at each non-root level of the hierarchy.
> > Validate local consistency by enforcing util.min < util.max.
> > Keep it simple by do not caring now about "effective" values computation
> > and propagation along the hierarchy.
>
> So, the followings are what we're doing for hierarchical protection
> and limit propgations.
>
> * Limits (high / max) default to max. Protections (low / min) 0. A
> new cgroup by default doesn't constrain itself further and doesn't
> have any protection.

Example 2
---------

Let say we have:

/tg1:
util_min=200 (as a protection)
util_max=800 (as a limit)

the moment we create a subgroup /tg1/tg11, in v9 it is initialized
with the same limits _and protections_ of its father:

/tg1/tg11:
util_min=200 (protection inherited from /tg1)
util_max=800 (limit inherited from /tg1)

Do you mean that we should have instead:

/tg1/tg11:
util_min=0 (no protection by default at creation time)
util_max=800 (limit inherited from /tg1)


i.e. we need to reset the protection of a newly created subgroup?


> * A limit defines the upper ceiling for the subtree. If an ancestor
> has a limit of X, none of its descendants can have more than X.

That's correct, however we distinguish between "requested" and
"effective" values.


Example 3
---------

We can have:

cg_name max max.effective
/uclamp/app 400 400
/uclamp/app/pid_bigs 500 400


Which means that a subgroup can "request" a limit (max=500) higher
then its father (max=400), while still getting only up to what its
father allows (max.effective = 400).


Example 4
---------

Tracking the actual requested limit (max=500) it's useful to enforce
it once the father limit should be relaxed, for example we will have:

cg_name max max.effective
/uclamp/app 600 600
/uclamp/app/pid_bigs 500 500

where a subgroup gets not more than what it has been configured for.

This is the logic implemented by cpu_util_update_eff() in:

[PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps
https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@xxxxxxx/


> * A protection defines the upper ceiling of protections for the
> subtree. If an andester has a protection of X, none of its
> descendants can have more protection than X.

Right, that's the current behavior in v9.

> Note that there's no way for an ancestor to enforce protection its
> descendants. It can only allow them to claim some. This is
> intentional as the other end of the spectrum is either descendants
> losing the ability to further distribute protections as they see fit.

Ok, that means I need to update in v10 the initialization of subgroups
min clamps to be none by default as discussed in the above Example 2,
right?

[...]

Cheers,
Patrick

--
#include <best/regards.h>

Patrick Bellasi