Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max

From: Xuewen Yan
Date: Wed Jun 02 2021 - 22:25:39 EST


+CC Qais


Hi Quentin

On Wed, Jun 2, 2021 at 9:22 PM Quentin Perret <qperret@xxxxxxxxxx> wrote:
>
> +CC Patrick and Tejun
>
> On Wednesday 02 Jun 2021 at 20:38:03 (+0800), Xuewen Yan wrote:
> > From: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> >
> > When setting cpu.uclamp.min/max in cgroup, there is no validating
> > like uclamp_validate() in __sched_setscheduler(). It may cause the
> > cpu.uclamp.min is bigger than cpu.uclamp.max.
>
> ISTR this was intentional. We also allow child groups to ask for
> whatever clamps they want, but that is always limited by the parent, and
> reflected in the 'effective' values, as per the cgroup delegation model.

It does not affect the 'effective' value. That because there is
protection in cpu_util_update_eff():
/* Ensure protection is always capped by limit */
eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);

When users set the cpu.uclamp.min > cpu.uclamp.max:
cpu.uclamp.max = 50;
to set : cpu.uclamp.min = 60;
That would make the uclamp_req[UCLAMP_MIN].value = 1024* 60% = 614,
uclamp_req[UCLAMP_MAX].value = 1024* 50% = 512;
But finally, the uclamp[UCLAMP_MIN].value = uclamp[UCLAMP_MAX].value
= 1024* 50% = 512;

Is it deliberately set not to validate because of the above?

>
> > Although there is protection in cpu_util_update_eff():
> > “eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX])”, it's better
> > not to let it happen.
> >
> > Judging the uclamp value before setting uclamp_min/max, avoid
> > the cpu.uclamp.min is bigger than cpu.uclamp.max.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> > ---
> > kernel/sched/core.c | 26 +++++++++++++++++++++++++-
> > 1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 5226cc26a095..520a2da40dc9 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8867,6 +8867,30 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> > rcu_read_lock();
> >
> > tg = css_tg(of_css(of));
> > +
> > + switch (clamp_id) {
> > + case UCLAMP_MIN: {
> > + unsigned int uc_req_max = tg->uclamp_req[UCLAMP_MAX].value;
> > +
> > + if (req.util > uc_req_max) {
> > + nbytes = -EINVAL;
> > + goto unlock;
> > + }
> > + break;
> > + }
> > + case UCLAMP_MAX: {
> > + unsigned int uc_req_min = tg->uclamp_req[UCLAMP_MIN].value;
> > +
> > + if (req.util < uc_req_min) {
> > + nbytes = -EINVAL;
> > + goto unlock;
> > + }
> > + break;
> > + }
> > + default:
> > + nbytes = -EINVAL;
> > + goto unlock;
> > + }
> > if (tg->uclamp_req[clamp_id].value != req.util)
> > uclamp_se_set(&tg->uclamp_req[clamp_id], req.util, false);
> >
> > @@ -8878,7 +8902,7 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> >
> > /* Update effective clamps to track the most restrictive value */
> > cpu_util_update_eff(of_css(of));
> > -
> > +unlock:
> > rcu_read_unlock();
> > mutex_unlock(&uclamp_mutex);
> >
> > --
> > 2.25.1
> >

When I change the code,I found the patch:

6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@xxxxxxx

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6a5124c..f97eb73 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1405,7 +1405,6 @@ uclamp_tg_restrict(struct task_struct *p, enum
uclamp_id clamp_id)
{
struct uclamp_se uc_req = p->uclamp_req[clamp_id];
#ifdef CONFIG_UCLAMP_TASK_GROUP
- struct uclamp_se uc_max;

/*
* Tasks in autogroups or root task group will be
@@ -1416,9 +1415,23 @@ uclamp_tg_restrict(struct task_struct *p, enum
uclamp_id clamp_id)
if (task_group(p) == &root_task_group)
return uc_req;

- uc_max = task_group(p)->uclamp[clamp_id];
- if (uc_req.value > uc_max.value || !uc_req.user_defined)
- return uc_max;
+ switch (clamp_id) {
+ case UCLAMP_MIN: {
+ struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
+ if (uc_req.value < uc_min.value)
+ return uc_min;
+ break;
+ }
+ case UCLAMP_MAX: {
+ struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
+ if (uc_req.value > uc_max.value)
+ return uc_max;
+ break;
+ }
+ default:
+ WARN_ON_ONCE(1);
+ break;
+ }
#endif

When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
Because when the p->uclamp_req[UCLAMP_MIN] > task_group(p)->uclamp[UCLAMP_MAX],
the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into [
task_group(p)->uclamp[UCLAMP_MAX], task_group(p)->uclamp[UCLAMP_MAX]
].

Thanks