Re: [PATCH sched_ext/for-6.12] sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()
From: Peter Zijlstra
Date: Fri Sep 06 2024 - 05:04:44 EST
On Thu, Sep 05, 2024 at 03:17:13PM -1000, Tejun Heo wrote:
> On Thu, Sep 05, 2024 at 06:41:42AM -1000, Tejun Heo wrote:
> > > @@ -12716,6 +12716,12 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> > > if (this_rq->cfs.h_nr_running && !pulled_task)
> > > pulled_task = 1;
> > >
> > > + /*
> > > + * We pulled a task, but it got stolen before we re-acquired rq->lock.
> > > + */
> > > + if (!this_rq->cfs.h_nr_running && pulled_task)
> > > + pulled_task = 0;
> > > +
> >
> > Lemme test that.
>
> Did a bit of testing and it seems like it's mostly coming from delayed
> dequeue handling. pick_next_entity() does this:
>
> struct sched_entity *se = pick_eevdf(cfs_rq);
> if (se->sched_delayed) {
> dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
> SCHED_WARN_ON(se->sched_delayed);
> SCHED_WARN_ON(se->on_rq);
> return NULL;
> }
>
> rq->cfs.nr_running includes the number of delay dequeued tasks which aren't
> really runnable, so it seems like balance_fair() saying yes and
> pick_next_entity() then hitting a delayed task.
Duh, yes.
> Maybe the solution is
> tracking the number of delayed ones and subtracting that from nr_running?
That came up yesterday for something else as well. Let me see if I can
make that happen.
Anyway, I suppose you keep your patch for now until I've managed to sort
this out.