Re: [PATCH v8 12/16] sched/core: uclamp: Extend CPU's cgroup controller

From: Patrick Bellasi
Date: Tue May 07 2019 - 07:43:31 EST


On 17-Apr 17:12, Suren Baghdasaryan wrote:
> On Tue, Apr 2, 2019 at 3:43 AM Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> >
> > The cgroup CPU bandwidth controller allows to assign a specified
> > (maximum) bandwidth to the tasks of a group. However this bandwidth is
> > defined and enforced only on a temporal base, without considering the
> > actual frequency a CPU is running on. Thus, the amount of computation
> > completed by a task within an allocated bandwidth can be very different
> > depending on the actual frequency the CPU is running that task.
> > The amount of computation can be affected also by the specific CPU a
> > task is running on, especially when running on asymmetric capacity
> > systems like Arm's big.LITTLE.
> >
> > With the availability of schedutil, the scheduler is now able
> > to drive frequency selections based on actual task utilization.
> > Moreover, the utilization clamping support provides a mechanism to
> > bias the frequency selection operated by schedutil depending on
> > constraints assigned to the tasks currently RUNNABLE on a CPU.
> >
> > Giving the mechanisms described above, it is now possible to extend the
> > cpu controller to specify the minimum (or maximum) utilization which
> > should be considered for tasks RUNNABLE on a cpu.
> > This makes it possible to better defined the actual computational
> > power assigned to task groups, thus improving the cgroup CPU bandwidth
> > controller which is currently based just on time constraints.
> >
> > Extend the CPU controller with a couple of new attributes util.{min,max}
> > which allows to enforce utilization boosting and capping for all the
> > tasks in a group. Specifically:
> >
> > - util.min: defines the minimum utilization which should be considered
> > i.e. the RUNNABLE tasks of this group will run at least at a
> > minimum frequency which corresponds to the util.min
> > utilization
> >
> > - util.max: defines the maximum utilization which should be considered
> > i.e. the RUNNABLE tasks of this group will run up to a
> > maximum frequency which corresponds to the util.max
> > utilization
> >
> > These attributes:
> >
> > a) are available only for non-root nodes, both on default and legacy
> > hierarchies, while system wide clamps are defined by a generic
> > interface which does not depends on cgroups. This system wide
> > interface enforces constraints on tasks in the root node.
> >
> > b) enforce effective constraints at each level of the hierarchy which
> > are a restriction of the group requests considering its parent's
> > effective constraints. Root group effective constraints are defined
> > by the system wide interface.
> > This mechanism allows each (non-root) level of the hierarchy to:
> > - request whatever clamp values it would like to get
> > - effectively get only up to the maximum amount allowed by its parent
> >
> > c) have higher priority than task-specific clamps, defined via
> > sched_setattr(), thus allowing to control and restrict task requests
> >
> > Add two new attributes to the cpu controller to collect "requested"
> > clamp values. Allow that at each non-root level of the hierarchy.
> > Validate local consistency by enforcing util.min < util.max.
> > Keep it simple by do not caring now about "effective" values computation
> > and propagation along the hierarchy.
> >
> > Signed-off-by: Patrick Bellasi <patrick.bellasi@xxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Tejun Heo <tj@xxxxxxxxxx>
> >
> > --
> > Changes in v8:
> > Message-ID: <20190214154817.GN50184@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
> > - update changelog description for points b), c) and following paragraph
> > ---
> > Documentation/admin-guide/cgroup-v2.rst | 27 +++++
> > init/Kconfig | 22 ++++
> > kernel/sched/core.c | 142 +++++++++++++++++++++++-
> > kernel/sched/sched.h | 6 +
> > 4 files changed, 196 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 7bf3f129c68b..47710a77f4fa 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -909,6 +909,12 @@ controller implements weight and absolute bandwidth limit models for
> > normal scheduling policy and absolute bandwidth allocation model for
> > realtime scheduling policy.
> >
> > +Cycles distribution is based, by default, on a temporal base and it
> > +does not account for the frequency at which tasks are executed.
> > +The (optional) utilization clamping support allows to enforce a minimum
> > +bandwidth, which should always be provided by a CPU, and a maximum bandwidth,
> > +which should never be exceeded by a CPU.
> > +
> > WARNING: cgroup2 doesn't yet support control of realtime processes and
> > the cpu controller can only be enabled when all RT processes are in
> > the root cgroup. Be aware that system management software may already
> > @@ -974,6 +980,27 @@ All time durations are in microseconds.
> > Shows pressure stall information for CPU. See
> > Documentation/accounting/psi.txt for details.
> >
> > + cpu.util.min
> > + A read-write single value file which exists on non-root cgroups.
> > + The default is "0", i.e. no utilization boosting.
> > +
> > + The requested minimum utilization in the range [0, 1024].
> > +
> > + This interface allows reading and setting minimum utilization clamp
> > + values similar to the sched_setattr(2). This minimum utilization
> > + value is used to clamp the task specific minimum utilization clamp.
> > +
> > + cpu.util.max
> > + A read-write single value file which exists on non-root cgroups.
> > + The default is "1024". i.e. no utilization capping
> > +
> > + The requested maximum utilization in the range [0, 1024].
> > +
> > + This interface allows reading and setting maximum utilization clamp
> > + values similar to the sched_setattr(2). This maximum utilization
> > + value is used to clamp the task specific maximum utilization clamp.
> > +
> > +
> >
> > Memory
> > ------
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 7439cbf4d02e..33006e8de996 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -877,6 +877,28 @@ config RT_GROUP_SCHED
> >
> > endif #CGROUP_SCHED
> >
> > +config UCLAMP_TASK_GROUP
> > + bool "Utilization clamping per group of tasks"
> > + depends on CGROUP_SCHED
> > + depends on UCLAMP_TASK
> > + default n
> > + help
> > + This feature enables the scheduler to track the clamped utilization
> > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
> > +
> > + When this option is enabled, the user can specify a min and max
> > + CPU bandwidth which is allowed for each single task in a group.
> > + The max bandwidth allows to clamp the maximum frequency a task
> > + can use, while the min bandwidth allows to define a minimum
> > + frequency a task will always use.
> > +
> > + When task group based utilization clamping is enabled, an eventually
> > + specified task-specific clamp value is constrained by the cgroup
> > + specified clamp value. Both minimum and maximum task clamping cannot
> > + be bigger than the corresponding clamping defined at task group level.
> > +
> > + If in doubt, say N.
> > +
> > config CGROUP_PIDS
> > bool "PIDs controller"
> > help
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 71c9dd6487b1..aeed2dd315cc 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1130,8 +1130,12 @@ static void __init init_uclamp(void)
> > /* System defaults allow max clamp values for both indexes */
> > uc_max.value = uclamp_none(UCLAMP_MAX);
> > uc_max.bucket_id = uclamp_bucket_id(uc_max.value);
> > - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
> > uclamp_default[clamp_id] = uc_max;
> > +#ifdef CONFIG_UCLAMP_TASK_GROUP
> > + root_task_group.uclamp_req[clamp_id] = uc_max;
> > +#endif
> > + }
> > }
> >
> > #else /* CONFIG_UCLAMP_TASK */
> > @@ -6720,6 +6724,19 @@ void ia64_set_curr_task(int cpu, struct task_struct *p)
> > /* task_group_lock serializes the addition/removal of task groups */
> > static DEFINE_SPINLOCK(task_group_lock);
> >
> > +static inline int alloc_uclamp_sched_group(struct task_group *tg,
> > + struct task_group *parent)
> > +{
> > +#ifdef CONFIG_UCLAMP_TASK_GROUP
> > + int clamp_id;
> > +
> > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
> > + tg->uclamp_req[clamp_id] = parent->uclamp_req[clamp_id];
> > +#endif
> > +
> > + return 1;
>
> Looks like you never return anything else neither here nor in the
> following patches I think...

That's right, I just preferred to keep the same structure in the
callsite below...

> > +}
> > +
> > static void sched_free_group(struct task_group *tg)
> > {
> > free_fair_sched_group(tg);
> > @@ -6743,6 +6760,9 @@ struct task_group *sched_create_group(struct task_group *parent)
> > if (!alloc_rt_sched_group(tg, parent))
> > goto err;
> >
> > + if (!alloc_uclamp_sched_group(tg, parent))
> > + goto err;
> > +

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

... under the assumption the compiler is smart enough to optimized that.

But perhaps it's less confusing to just use void, will update in v9.

> > return tg;
> >
> > err:
--
#include <best/regards.h>

Patrick Bellasi