Re: [PATCH v2 2/3] sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS
From: Peter Zijlstra
Date: Fri Jun 11 2021 - 05:20:55 EST
On Fri, Jun 11, 2021 at 08:59:25AM +0000, Quentin Perret wrote:
> On Thursday 10 Jun 2021 at 21:15:45 (+0200), Peter Zijlstra wrote:
> > On Thu, Jun 10, 2021 at 03:13:05PM +0000, Quentin Perret wrote:
> > > SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that
> > > the call must not touch scheduling parameters (nice or priority). This
> > > is particularly handy for uclamp when used in conjunction with
> > > SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only
> > > impacts uclamp values.
> > >
> > > However, sched_setattr always checks whether the priorities and nice
> > > values passed in sched_attr are valid first, even if those never get
> > > used down the line. This is useless at best since userspace can
> > > trivially bypass this check to set the uclamp values by specifying low
> > > priorities. However, it is cumbersome to do so as there is no single
> > > expression of this that skips both RT and CFS checks at once. As such,
> > > userspace needs to query the task policy first with e.g. sched_getattr
> > > and then set sched_attr.sched_priority accordingly. This is racy and
> > > slower than a single call.
> > >
> > > As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS
> > > is specified, simply inherit them in this case to match the policy
> > > inheritance of SCHED_FLAG_KEEP_POLICY.
> > >
> > > Reported-by: Wei Wang <wvw@xxxxxxxxxx>
> > > Signed-off-by: Quentin Perret <qperret@xxxxxxxxxx>
> > > ---
> > > kernel/sched/core.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 3b213402798e..1d4aedbbcf96 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -6585,6 +6585,10 @@ SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> > > rcu_read_unlock();
> > >
> > > if (likely(p)) {
> > > + if (attr.sched_flags & SCHED_FLAG_KEEP_PARAMS) {
> > > + attr.sched_priority = p->rt_priority;
> > > + attr.sched_nice = task_nice(p);
> > > + }
> > > retval = sched_setattr(p, &attr);
> > > put_task_struct(p);
> > > }
> >
> > I don't like this much... afaict the KEEP_PARAMS clause in
> > __setscheduler() also covers the DL params, and you 'forgot' to copy
> > those.
> >
> > Can't we short circuit the validation logic?
>
> I think we can but I didn't like the look of it, because we end up
> sprinkling checks all over the place. KEEP_PARAMS doesn't imply
> KEEP_POLICY IIUC, and the policy and params checks are all mixed up.
>
> But maybe that wants fixing too?
If you can make that code nicer, I'm all for it, it's a bit of a mess.
But failing that, I suppose the alternative is extracting something like
get_params from sched_getattr() and sharing that bit of code to do what
you do above.
> I guess it could make sense to switch
> policies without touching the params in some cases (e.g switching
> between FIFO and RR, or BATCH and NORMAL), but I'm not sure what that
> would mean for cross-sched_class transitions.
You're right, cross-class needs both.