Re: [PATCH] sched/rt: Add a new sysctl to control uclamp_util_min

From: Patrick Bellasi
Date: Wed Jan 08 2020 - 14:09:08 EST


On 08-Jan 14:44, Peter Zijlstra wrote:
> On Fri, Dec 20, 2019 at 04:48:38PM +0000, Qais Yousef wrote:
> > RT tasks by default try to run at the highest capacity/performance
> > level. When uclamp is selected this default behavior is retained by
> > enforcing the uclamp_util_min of the RT tasks to be
> > uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
> > value.
> >
> > See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").
> >
> > On battery powered devices, this default behavior could consume more
> > power, and it is desired to be able to tune it down. While uclamp allows
> > tuning this by changing the uclamp_util_min of the individual tasks, but
> > this is cumbersome and error prone.
> >
> > To control the default behavior globally by system admins and device
> > integrators, introduce the new sysctl_sched_rt_uclamp_util_min to
> > change the default uclamp_util_min value of the RT tasks.
> >
> > Whenever the new default changes, it'd be applied on the next wakeup of
> > the RT task, assuming that it still uses the system default value and
> > not a user applied one.
>
> This is because these RT tasks are not in a cgroup or not affected by
> cgroup settings? I feel the justification is a little thin here.

RT task are kind of special right now. To keep simple the initial
implementation we hardcoded the behavior: always run at max OPP unless
explicitely asked by a task-specific value.

To add a system wide setting specifically for RT tasks, we need to
generalize what we already do for CFS tasks and keep the behavior of
the two classes aligned (apart for the default value).
IOW, no rt.c specific code should be required.

> > If the uclamp_util_min of an RT task is 0, then the RT utilization of
> > the rq is used to drive the frequency selection in schedutil for RT
> > tasks.
>
> Did cpu_uclamp_write() forget to check for input<0 ?

The cgroup API uses percentages, which gets only sanitized [0..100].00
values.

Moreover, capacity_from_percent() returns a uclamp_request.util which
is a u64. Thus, there should not be issues related to negative values.
Writing such a value should just fail the write syscall.


--
#include <best/regards.h>

Patrick Bellasi