Re: [PATCH] sched: Optimize pick_next_task for idle_sched_class too

From: Peter Zijlstra
Date: Thu Feb 23 2017 - 12:49:35 EST


On Thu, Feb 23, 2017 at 10:59:15PM +0530, Pavan Kondeti wrote:
> On Thu, Feb 23, 2017 at 10:07 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, Feb 23, 2017 at 04:25:33PM +0100, Peter Zijlstra wrote:
> >>
> >> Ah, I read your question wrong. Yes I think you're right, we now loose
> >> the pull when the last RT task goes away.
> >>
> >> Hmm.. how to fix that nicely..
> >
> > Something like so perhaps? This would make a pull happen when the last
> > RT task on this CPU goes away.
> >
> > Steve?
> >
> > ---
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 9f3e40226dec..283d591078b0 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1336,6 +1336,9 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
> > dequeue_rt_entity(rt_se, flags);
> >
> > dequeue_pushable_task(rq, p);
> > +
> > + if (!rq->rt.rt_nr_running)
> > + queue_pull_task(rq);
> > }
> >
> > /*
>
> The next balance_callback() is not called until the context switch is
> completed. So we potentially pick a lower class task before the pull
> happens. Would it be wrong to call pull_rt_task() directly instead of
> queuing the callback.

deactivate_task()...->dequeue_task_rt() cannot drop the rq->lock which
would be required to pull.

Hurm.. maybe we should do what Steve initially suggested. The
alternative is link order trickery, and I'm not sure we want to do that.