Re: [PATCH v6 05/16] sched/core: uclamp: Update CPU's refcount on clamp changes

From: Peter Zijlstra
Date: Wed Jan 23 2019 - 04:16:43 EST


On Tue, Jan 22, 2019 at 03:33:15PM +0000, Patrick Bellasi wrote:
> On 22-Jan 15:57, Peter Zijlstra wrote:
> > On Tue, Jan 22, 2019 at 02:01:15PM +0000, Patrick Bellasi wrote:

> > > Yes, I would say we have two options:
> > >
> > > 1) SCHED_FLAG_KEEP_POLICY enforces all the scheduling class specific
> > > attributes, but cross class attributes (e.g. uclamp)
> > >
> > > 2) add SCHED_KEEP_NICE, SCHED_KEEP_PRIO, and SCED_KEEP_PARAMS
> > > and use them in the if conditions in D)
> >
> > So the current KEEP_POLICY basically provides sched_setparam(), and
>
> But it's not exposed user-space.

Correct; not until your first patch indeed.

> > given we have that as a syscall, that is supposedly a useful
> > functionality.
>
> For uclamp is definitively useful to change clamps without the need to
> read beforehand the current policy params and use them in a following
> set syscall... which is racy pattern.

Right; but my argument was mostly that if sched_setparam() is a useful
interface, a 'pure' KEEP_POLICY would be too and your (1) looses that.

> > And I suppose the UTIL_CLAMP is !KEEP_UTIL; we could go either way
> > around with that flag.
>
> What about getting rid of the racy case above by exposing userspace
> only the new UTIL_CLAMP and, on:
>
> sched_setscheduler(flags: UTIL_CLAMP)
>
> we enforce the other two flags from the syscall:
>
> ---8<---
> SYSCALL_DEFINE3(sched_setattr)
> if (attr.sched_flags & SCHED_FLAG_KEEP_POLICY) {
> attr.sched_policy = SETPARAM_POLICY;
> attr.sched_flags |= (KEEP_POLICY|KEEP_PARAMS);
> }
> ---8<---
>
> This will not make possible to change class and set flags in one go,
> but honestly that's likely a very limited use-case, isn't it ?

So I must admit to not seeing much use for sched_setparam() (and its
equivalents) myself, but given it is an existing interface, I also think
it would be nice to cover that functionality in the sched_setattr()
call.

That is; I know of userspace priority-ceiling implementations using
sched_setparam(), but I don't see any reason why that wouldn't also work
with sched_setscheduler() (IOW always also set the policy).

> > > In both cases the goal should be to return from code block D).
> >
> > I don't think so; we really do want to 'goto change' for util changes
> > too I think. Why duplicate part of that logic?
>
> But that will force a dequeue/enqueue... isn't too much overhead just
> to change a clamp value?

These syscalls aren't what I consider fast paths anyway. However, there
are people that rely on the scheduler syscalls not to schedule
themselves, or rather be non-blocking (see for example that prio-ceiling
implementation).

And in that respect the newly introduced uclamp_mutex does appear to be
a problem.

Also; do you expect these clamp values to be changed often?

> Perhaps we can also end up with some wired

s/wired/weird/, right?

> side-effects like the task being preempted ?

Nothing worse than any other random reschedule would cause.

> Consider also that the uclamp_task_update_active() added by this patch
> not only has lower overhead but it will be use also by cgroups where
> we want to force update all the tasks on a cgroup's clamp change.

I haven't gotten that far; but I would prefer not to have two different
'change' paths in __sched_setscheduler().