Re: [PATCH v5 14/15] sched/core: uclamp: use TG's clamps to restrict Task's clamps
From: Patrick Bellasi
Date: Mon Oct 29 2018 - 14:47:09 EST
Slightly older version posted by error along with the correct one.
Please comment on:
Message-ID: <20181029183311.29175-17-patrick.bellasi@xxxxxxx>
Sorry for the noise.
On 29-Oct 18:33, Patrick Bellasi wrote:
> When a task's util_clamp value is configured via sched_setattr(2), this
> value has to be properly accounted in the corresponding clamp group
> every time the task is enqueued and dequeued. When cgroups are also in
> use, per-task clamp values have to be aggregated to those of the CPU's
> controller's Task Group (TG) in which the task is currently living.
>
> Let's update uclamp_cpu_get() to provide aggregation between the task
> and the TG clamp values. Every time a task is enqueued, it will be
> accounted in the clamp_group which defines the smaller clamp between the
> task specific value and its TG effective value.
>
> This also mimics what already happen for a task's CPU affinity mask when
> the task is also living in a cpuset. The overall idea is that cgroup
> attributes are always used to restrict the per-task attributes.
>
> Thus, this implementation allows to:
>
> 1. ensure cgroup clamps are always used to restrict task specific
> requests, i.e. boosted only up to the effective granted value or
> clamped at least to a certain value
> 2. implements a "nice-like" policy, where tasks are still allowed to
> request less then what enforced by their current TG
>
> For this mechanisms to work properly, we exploit the concept of
> "effective" clamp, which is already used by a TG to track parent
> enforced restrictions.
> In this patch we re-use the same variable:
> task_struct::uclamp::effective::group_id
> to track the currently most restrictive clamp group each task is
> subject to and thus it's also currently refcounted into.
>
> This solution allows also to better decouple the slow-path, where task
> and task group clamp values are updated, from the fast-path, where the
> most appropriate clamp value is tracked by refcounting clamp groups.
>
> For consistency purposes, as well as to properly inform userspace, the
> sched_getattr(2) call is updated to always return the properly
> aggregated constrains as described above. This will also make
> sched_getattr(2) a convenient userspace API to know the utilization
> constraints enforced on a task by the cgroup's CPU controller.
>
> Signed-off-by: Patrick Bellasi <patrick.bellasi@xxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Paul Turner <pjt@xxxxxxxxxx>
> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> Cc: Todd Kjos <tkjos@xxxxxxxxxx>
> Cc: Joel Fernandes <joelaf@xxxxxxxxxx>
> Cc: Steve Muckle <smuckle@xxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxxxxx>
> Cc: Quentin Perret <quentin.perret@xxxxxxx>
> Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> Cc: Morten Rasmussen <morten.rasmussen@xxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx
>
> ---
> Changes in v4:
> Message-ID: <20180816140731.GD2960@e110439-lin>
> - reuse already existing:
> task_struct::uclamp::effective::group_id
> instead of adding:
> task_struct::uclamp_group_id
> to back annotate the effective clamp group in which a task has been
> refcounted
> Others:
> - small documentation fixes
> - rebased on v4.19-rc1
>
> Changes in v3:
> Message-ID: <CAJuCfpFnj2g3+ZpR4fP4yqfxs0zd=c-Zehr2XM7m_C+WdL9jNA@xxxxxxxxxxxxxx>
> - rename UCLAMP_NONE into UCLAMP_NOT_VALID
> - fix not required override
> - fix typos in changelog
> Others:
> - clean up uclamp_cpu_get_id()/sched_getattr() code by moving task's
> clamp group_id/value code into dedicated getter functions:
> uclamp_task_group_id(), uclamp_group_value() and uclamp_task_value()
> - rebased on tip/sched/core
> Changes in v2:
> OSPM discussion:
> - implement a "nice" semantics where cgroup clamp values are always
> used to restrict task specific clamp values, i.e. tasks running on a
> TG are only allowed to demote themself.
> Other:
> - rabased on v4.18-rc4
> - this code has been split from a previous patch to simplify the review
> ---
> include/linux/sched.h | 9 +++++++
> kernel/sched/core.c | 58 +++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 62 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7698e7554892..4b61fbcb0797 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -609,12 +609,21 @@ struct sched_dl_entity {
> * The active bit is set whenever a task has got an effective clamp group
> * and value assigned, which can be different from the user requested ones.
> * This allows to know a task is actually refcounting a CPU's clamp group.
> + *
> + * The user_defined bit is set whenever a task has got a task-specific clamp
> + * value requested from userspace, i.e. the system defaults applies to this
> + * task just as a restriction. This allows to relax TG's clamps when a less
> + * restrictive task specific value has been defined, thus allowing to
> + * implement a "nice" semantic when both task group and task specific values
> + * are used. For example, a task running on a 20% boosted TG can still drop
> + * its own boosting to 0%.
> */
> struct uclamp_se {
> unsigned int value : SCHED_CAPACITY_SHIFT + 1;
> unsigned int group_id : order_base_2(UCLAMP_GROUPS);
> unsigned int mapped : 1;
> unsigned int active : 1;
> + unsigned int user_defined : 1;
> /*
> * Clamp group and value actually used by a scheduling entity,
> * i.e. a (RUNNABLE) task or a task group.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e2292c698e3b..2ce84d22ab17 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -875,6 +875,28 @@ static inline void uclamp_cpu_update(struct rq *rq, unsigned int clamp_id,
> rq->uclamp.value[clamp_id] = max_value;
> }
>
> +/**
> + * uclamp_apply_defaults: check if p is subject to system default clamps
> + * @p: the task to check
> + *
> + * Tasks in the root group or autogroups are always and only limited by system
> + * defaults. All others instead are limited by their TG's specific value.
> + * This method checks the conditions under witch a task is subject to system
> + * default clamps.
> + */
> +#ifdef CONFIG_UCLAMP_TASK_GROUP
> +static inline bool uclamp_apply_defaults(struct task_struct *p)
> +{
> + if (task_group_is_autogroup(task_group(p)))
> + return true;
> + if (task_group(p) == &root_task_group)
> + return true;
> + return false;
> +}
> +#else
> +#define uclamp_apply_defaults(p) true
> +#endif
> +
> /**
> * uclamp_effective_group_id: get the effective clamp group index of a task
> * @p: the task to get the effective clamp value for
> @@ -882,9 +904,11 @@ static inline void uclamp_cpu_update(struct rq *rq, unsigned int clamp_id,
> *
> * The effective clamp group index of a task depends on:
> * - the task specific clamp value, explicitly requested from userspace
> + * - the task group effective clamp value, for tasks not in the root group or
> + * in an autogroup
> * - the system default clamp value, defined by the sysadmin
> - * and tasks specific's clamp values are always restricted by system
> - * defaults clamp values.
> + * and tasks specific's clamp values are always restricted, with increasing
> + * priority, by their task group first and the system defaults after.
> *
> * This method returns the effective group index for a task, depending on its
> * status and a proper aggregation of the clamp values listed above.
> @@ -908,6 +932,22 @@ static inline unsigned int uclamp_effective_group_id(struct task_struct *p,
> clamp_value = p->uclamp[clamp_id].value;
> group_id = p->uclamp[clamp_id].group_id;
>
> + if (!uclamp_apply_defaults(p)) {
> +#ifdef CONFIG_UCLAMP_TASK_GROUP
> + unsigned int clamp_max =
> + task_group(p)->uclamp[clamp_id].effective.value;
> + unsigned int group_max =
> + task_group(p)->uclamp[clamp_id].effective.group_id;
> +
> + if (!p->uclamp[clamp_id].user_defined ||
> + clamp_value > clamp_max) {
> + clamp_value = clamp_max;
> + group_id = group_max;
> + }
> +#endif
> + goto done;
> + }
> +
> /* RT tasks have different default values */
> default_clamp = task_has_rt_policy(p)
> ? uclamp_default_perf
> @@ -924,6 +964,8 @@ static inline unsigned int uclamp_effective_group_id(struct task_struct *p,
> group_id = default_clamp[clamp_id].group_id;
> }
>
> +done:
> +
> p->uclamp[clamp_id].effective.value = clamp_value;
> p->uclamp[clamp_id].effective.group_id = group_id;
>
> @@ -936,8 +978,10 @@ static inline unsigned int uclamp_effective_group_id(struct task_struct *p,
> * @rq: the CPU's rq where the clamp group has to be reference counted
> * @clamp_id: the clamp index to update
> *
> - * Once a task is enqueued on a CPU's rq, the clamp group currently defined by
> - * the task's uclamp::group_id is reference counted on that CPU.
> + * Once a task is enqueued on a CPU's rq, with increasing priority, we
> + * reference count the most restrictive clamp group between the task specific
> + * clamp value, the clamp value of its task group and the system default clamp
> + * value.
> */
> static inline void uclamp_cpu_get_id(struct task_struct *p, struct rq *rq,
> unsigned int clamp_id)
> @@ -1312,10 +1356,12 @@ static int __setscheduler_uclamp(struct task_struct *p,
>
> /* Update each required clamp group */
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
> + p->uclamp[UCLAMP_MIN].user_defined = true;
> uclamp_group_get(p, &p->uclamp[UCLAMP_MIN],
> UCLAMP_MIN, lower_bound);
> }
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) {
> + p->uclamp[UCLAMP_MAX].user_defined = true;
> uclamp_group_get(p, &p->uclamp[UCLAMP_MAX],
> UCLAMP_MAX, upper_bound);
> }
> @@ -1359,8 +1405,10 @@ static void uclamp_fork(struct task_struct *p, bool reset)
> for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
> unsigned int clamp_value = p->uclamp[clamp_id].value;
>
> - if (unlikely(reset))
> + if (unlikely(reset)) {
> clamp_value = uclamp_none(clamp_id);
> + p->uclamp[clamp_id].user_defined = false;
> + }
>
> p->uclamp[clamp_id].mapped = false;
> p->uclamp[clamp_id].active = false;
> --
> 2.18.0
>
--
#include <best/regards.h>
Patrick Bellasi