Re: [PATCH] sched/uclamp: Avoid setting cpu.uclamp.min bigger than cpu.uclamp.max

From: Xuewen Yan
Date: Tue Jun 08 2021 - 07:47:48 EST


First of all, sorry for the late reply..

On Mon, Jun 7, 2021 at 9:49 PM Qais Yousef <qais.yousef@xxxxxxx> wrote:
>
> On 06/05/21 21:24, Xuewen Yan wrote:
> > Hi Qais
> >
> > On Sat, Jun 5, 2021 at 7:49 PM Qais Yousef <qais.yousef@xxxxxxx> wrote:
> > > > >
> > > > In addition,In your patch:
> > > > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > > > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@xxxxxxx
> > > >
> > > > + switch (clamp_id) {
> > > > + case UCLAMP_MIN: {
> > > > + struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> > > > + if (uc_req.value < uc_min.value)
> > > > + return uc_min;
> > > > + break;
> > > >
> > > > When the clamp_id = UCLAMP_MIN, why not judge the uc_req.value is
> > > > bigger than task_group(p)->uclamp[UCLAMP_MAX] ?
> > >
> > > Because of the requirement I pointed you to in cgroup-v2.rst. We must allow any
> > > value to be requested.
> > >
> > > Ultimately if we had
> > >
> > > cpu.uclamp.min = 80
> > > cpu.uclamp.max = 50
> > >
> > > then we want to remember the original request but make sure the effective value
> > > is capped.
> > >
> > > For the user in the future modifies the values such that
> > >
> > > cpu.uclamp.max = max
> > >
> > > Then we want to remember cpu.uclamp.min = 80 and apply it since now the
> > > cpu.uclamp.max was relaxed to allow the boost value.
> > >
> > > > Because when the p->uclamp_req[UCLAMP_MIN] > task_group(p)->uclamp[UCLAMP_MAX],
> > > > the patch can not clamp the p->uclamp_req[UCLAMP_MIN/MAX] into
> > > > [ task_group(p)->uclamp[UCLAMP_MAX], task_group(p)->uclamp[UCLAMP_MAX] ].
> > > >
> > > > Is it necessary to fix it here?
> > >
> > > Nope. We must allow any combination values to be accepted and remember them so
> > > if one changes we ensure the new effective value is updated accordingly.
> > > This is how cgroups API works.
> >
> > Sorry. I may not have expressed it clearly. In your patch (which has
> > not yet merged into the mainline):
> >
> > 6938840392c89 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min")
> > https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@xxxxxxx
> >
> > This patch will not affect p->uclamp_req, but consider the following situation:
> >
> > tg->cpu.uclamp.min = 0
> > tg->cpu.uclamp.max = 50%
> >
> > p->uclamp_req[UCLAMP_MIN] = 60%
> > p->uclamp_req[UCLAMP_MIN] = 80%

sorry, here should be
p->uclamp_req[UCLAMP_MIN] = 60%
p->uclamp_req[UCLAMP_MAX] = 80%

> >
> > The function call process is as follows:
> > uclamp_eff_value() -> uclamp_eff_get() ->uclamp_tg_restrict()
> >
> > with your patch, the result is:
> >
> > p->effective_uclamp_min = 60%
> > p->effective_uclamp_max = 50%
> >
> > It would not affect the uclamp_task_util(p), but affect the rq:
> > when p enqueued:
> > rq->uclamp[UCLAMP_MIN] = 60%
> > rq->uclamp[UCLAMP_MIN] = 50%

sorry, here should be
rq->uclamp[UCLAMP_MIN] = 60%
rq->uclamp[UCLAMP_MAX] = 50%

> >
> > futher more, in uclamp_rq_util_with() {
> > ...
> >
> > min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value); //60%
> > max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);//50%
> > ...
> > if (unlikely(min_util >= max_util))
> > return min_util;
> >
> > return clamp(util, min_util, max_util);
> > ...
> > }
> > as a result, it would return 60%.
>
> Looking at this again now, I better understand what you were trying to say.
> I got confused that you were still arguing about cgroup inverted
> cpu.uclamp.min/max, but you're actually talking about something else.

Generally speaking, this kind of situation does not basically exist,
but I just consider all the situations that can occur when users use
it.

>
> It would be a lot easier to not cross talk threads and reply to my patch
> directly with this remark.
Sorry for the trouble because of my unfamiliar with the maillist, I
will pay attention next time :)

>
> Anyways, still well spotted!
>
> What you're saying is we need something like the patch below to ensure that the
> *task request* is within tg uclamp range, right? The worry is that the task's
> uclamp_min is higher than the tg's uclamp_min, so we end up with the inversion
> because of that which will not be corrected later.

Yeah, the task's uclamp_min is higher than the tg's uclamp_max.
>
> Hmm I need to think a bit more about this..
>
> Cheers
>
> --
> Qais Yousef
>
> --->8---
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9e9a5be35cde..e867813b9d5e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1405,6 +1405,7 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> {
> struct uclamp_se uc_req = p->uclamp_req[clamp_id];
> #ifdef CONFIG_UCLAMP_TASK_GROUP
> + unsigned long uc_min, uc_max, val;
>
> /*
> * Tasks in autogroups or root task group will be
> @@ -1415,23 +1416,10 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> if (task_group(p) == &root_task_group)
> return uc_req;
>
> - switch (clamp_id) {
> - case UCLAMP_MIN: {
> - struct uclamp_se uc_min = task_group(p)->uclamp[clamp_id];
> - if (uc_req.value < uc_min.value)
> - return uc_min;
> - break;
> - }
> - case UCLAMP_MAX: {
> - struct uclamp_se uc_max = task_group(p)->uclamp[clamp_id];
> - if (uc_req.value > uc_max.value)
> - return uc_max;
> - break;
> - }
> - default:
> - WARN_ON_ONCE(1);
> - break;
> - }
> + uc_min = task_group(p)->uclamp[UCLAMP_MIN].value;
> + uc_max = task_group(p)->uclamp[UCLAMP_MAX].value;
> + val = uc_req.value;
> + uc_req.value = clamp(val, uc_min, uc_max);

This is not a good solution, because it just clamp the uc_req.value,
but the uc_req.bucket_id is not changed.

> #endif
>
> return uc_req;
>

Thanks!
xuewen