Re: [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups

From: Patrick Bellasi
Date: Wed Sep 12 2018 - 10:19:32 EST


On 09-Sep 11:52, Suren Baghdasaryan wrote:
> On Tue, Aug 28, 2018 at 6:53 AM, Patrick Bellasi
> <patrick.bellasi@xxxxxxx> wrote:

[...]

> > +/**
> > + * release_uclamp_sched_group: release utilization clamp references of a TG
>
> free_uclamp_sched_group

+1

> > + * @tg: the task group being removed
> > + *
> > + * An empty task group can be removed only when it has no more tasks or child
> > + * groups. This means that we can also safely release all the reference
> > + * counting to clamp groups.
> > + */
> > +static inline void free_uclamp_sched_group(struct task_group *tg)
> > +{

[...]

> > @@ -1417,9 +1444,18 @@ static void __init init_uclamp(void)
> > #ifdef CONFIG_UCLAMP_TASK_GROUP
> > /* Init root TG's clamp group */
> > uc_se = &root_task_group.uclamp[clamp_id];
> > +
> > uc_se->effective.value = uclamp_none(clamp_id);
> > - uc_se->value = uclamp_none(clamp_id);
> > - uc_se->group_id = 0;
> > + uc_se->effective.group_id = 0;
> > +
> > + /*
> > + * The max utilization is always allowed for both clamps.
> > + * This is required to not force a null minimum utiliation on
> > + * all child groups.
> > + */
> > + uc_se->group_id = UCLAMP_NOT_VALID;
> > + uclamp_group_get(NULL, clamp_id, 0, uc_se,
> > + uclamp_none(UCLAMP_MAX));
>
> I don't quite get why you are using uclamp_none(UCLAMP_MAX) for both
> UCLAMP_MIN and UCLAMP_MAX clamps. I assume the comment above is to
> explain this but I'm still unclear why this is done.

That's maybe a bit tricky to get but, this will not happen since for
root group tasks we apply the system default values... which however
are introduced by one of the following patches 11/16.

So, my understanding of the "delegation model" is that for cgroups we
have to ensure each TG is a "restriction" of its parent. Thus:

tg::util_min <= tg_parent::util_min

This is required to ensure that a tg_parent can always restrict
resources on its descendants. I guess that's required to have a sane
usage of CGroups for VMs where the Host can transparently control its
Guests.

According to the above rule, and considering that root task group
cannot be modified, to allow boosting on TG we are forced to set the
root group with util_min = SCHED_CAPACITY_SCALE.

Moreover, Tejun pointed out that if we need tuning at root TG level,
it means that we need system wide tunable, which should be available
also when CGroups are not in use.

That's why on patch:

[PATCH v4 11/16] sched/core: uclamp: add system default clamps
https://lore.kernel.org/lkml/20180828135324.21976-12-patrick.bellasi@xxxxxxx/

we add the concept of system default clamps which are actually
initialized with util_min=0, i.e. 0% boost by default.

These system default clamp values applies to tasks which are running
either in the root task group on in an autogroup, which also cannot be
tuned at run-time, whenever the task has not a task specific clamp
value specified.

All that considered, the code above is still confusing and I should
consider moving to patch 11/16 the initialization to UCLAMP_MAX for
util_min...

> Maybe expand the comment to explain the intention?

... and add there something like:

/*
* The max utilization is always allowed for both clamps.
* This satisfies the "delegation model" required by CGroups
* v2, where a child task group cannot have more resources then
* its father, thus allowing the creation of child groups with
* a non null util_min.
* For tasks within the root_task_group we will use the system
* default clamp values anyway, thus they will not be boosted
* to the max utilization by default.
*/

It this more clear ?


> With this I think all TGs will get boosted by default, won't they?

You right, at cgroup creation time we clone parent's clamps... thus,
all root_task_group's children group will get max boosting at creation
time. However, since we don't have task within a newly created task
group, the system management software can still refine the clamps
before staring to move tasks in there.

Do you think we should initialize root task group childrens
differently ? I would prefer to avoid special cases if not strictly
required...

Cheers,
Patrick

--
#include <best/regards.h>

Patrick Bellasi