Re: [PATCH -tip 02/32] sched: Introduce sched_class::pick_task()
From: Peter Zijlstra
Date: Thu Nov 26 2020 - 07:41:02 EST
On Thu, Nov 26, 2020 at 11:17:48AM +0100, Vincent Guittot wrote:
> > Something like so then?
>
> yes. it seems ok
>
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6982,20 +6982,29 @@ static void check_preempt_wakeup(struct
> > #ifdef CONFIG_SMP
> > static struct task_struct *pick_task_fair(struct rq *rq)
> > {
> > struct sched_entity *se;
> > + struct cfs_rq *cfs_rq;
> > +
> > +again:
> > + cfs_rq = &rq->cfs;
> > if (!cfs_rq->nr_running)
> > return NULL;
> >
> > do {
> > struct sched_entity *curr = cfs_rq->curr;
> >
> > + /* When we pick for a remote RQ, we'll not have done put_prev_entity() */
> > + if (curr) {
> > + if (curr->on_rq)
> > + update_curr(cfs_rq);
> > + else
> > + curr = NULL;
> >
> > + if (unlikely(check_cfs_rq_runtime(cfs_rq)))
> > + goto again;
Head-ache though; pick_task() was supposed to be stateless, but now
we're modifying a remote runqueue... I suppose it still works, because
irrespective of which task we end up picking (even idle), we'll schedule
the remote CPU, which would've resulted in the same (and possibly
triggered a reschedule if we'd not done it here).
There's a wrinkle through, other than in schedule(), where we dequeue()
and keep running with the current task while we release rq->lock, this
has preemption enabled as well.
This means that if we do this, the remote CPU could preempt, but the
task is then no longer on the runqueue.
I _think_ it all still works, but yuck!
> > + }
> >
> > + se = pick_next_entity(cfs_rq, curr);
> > cfs_rq = group_cfs_rq(se);
> > } while (cfs_rq);
> >