Re: [PATCH] sched/rt: Add a new sysctl to control uclamp_util_min

From: Qais Yousef
Date: Tue Jan 14 2020 - 16:34:12 EST


On 01/09/20 10:21, Patrick Bellasi wrote:
> That's not entirely true. In that patch we introduce cgroup support
> but, if you look at the code before that patch, for CFS tasks there is
> only:
> - CFS task-specific values (min,max)=(0,1024) by default
> - CFS system-wide tunables (min,max)=(1024,1024) by default
> and a change on the system-wide tunable allows for example to enforce
> a uclamp_max=200 on all tasks.
>
> A similar solution can be implemented for RT tasks, where we have:
> - RT task-specific values (min,max)=(1024,1024) by default
> - RT system-wide tunables (min,max)=(1024,1024) by default
> and a change on the system-wide tunable allows for example to enforce
> a uclamp_min=200 on all tasks.

I feel I'm already getting lost in the complexity of the interaction here. Do
we really need to go that path?

So we will end up with a default system wide for all tasks + a CFS system wide
default + an RT system wide default?

As I understand it, we have a single system wide default now.

>
> > (Would we need CONFIG_RT_GROUP_SCHED for this? IIRC there's a few pain points
> > when turning it on, but I think we don't have to if we just want things like
> > uclamp value propagation?)
>
> No, the current design for CFS tasks works also on !CONFIG_CFS_GROUP_SCHED.
> That's because in this case:
> - uclamp_tg_restrict() returns just the task requested value
> - uclamp_eff_get() _always_ restricts the requested value considering
> the system defaults
>
> > It's quite more work than the simple thing Qais is introducing (and on both
> > user and kernel side).
>
> But if in the future we will want to extend CGroups support to RT then
> we will feel the pains because we do the effective computation in two
> different places.

Hmm what you're suggesting here is that we want to have
cpu.rt.uclamp.{min,max}? I'm not sure I can agree this is a good idea.

It makes more sense to create a special group for all rt tasks rather than
treat rt tasks in a cgroup differently.

>
> Do note that a proper CGroup support requires that the system default
> values defines the values for the root group and are consistently
> propagated down the hierarchy. Thus we need to add a dedicated pair of
> cgroup attributes, e.g. cpu.util.rt.{min.max}.
>
> To recap, we don't need CGROUP support right now but just to add a new
> default tracking similar to what we do for CFS.
>
> We already proposed such a support in one of the initial versions of
> the uclamp series:
> Message-ID: <20190115101513.2822-10-patrick.bellasi@xxxxxxx>
> https://lore.kernel.org/lkml/20190115101513.2822-10-patrick.bellasi@xxxxxxx/

IIUC what you're suggesting is:

1. Use the sysctl to specify the default_rt_uclamp_min
2. Enforce this value in uclamp_eff_get() rather than my sync logic
3. Remove the current hack to always set
rt_task->uclamp_min = uclamp_none(UCLAMP_MAX)

If I got it correctly I'd be happy to experiment with it if this is what
you're suggesting. Otherwise I'm afraid I'm failing to see the crust of the
problem you're trying to highlight.

Thanks

--
Qais Yousef