Re: [PATCH sched_ext/for-6.12] sched_ext: Handle cases where pick_task_scx() is called without preceding balance_scx()

From: Tejun Heo
Date: Thu Sep 05 2024 - 21:17:24 EST


On Thu, Sep 05, 2024 at 06:41:42AM -1000, Tejun Heo wrote:
> > @@ -12716,6 +12716,12 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> > if (this_rq->cfs.h_nr_running && !pulled_task)
> > pulled_task = 1;
> >
> > + /*
> > + * We pulled a task, but it got stolen before we re-acquired rq->lock.
> > + */
> > + if (!this_rq->cfs.h_nr_running && pulled_task)
> > + pulled_task = 0;
> > +
>
> Lemme test that.

Did a bit of testing and it seems like it's mostly coming from delayed
dequeue handling. pick_next_entity() does this:

struct sched_entity *se = pick_eevdf(cfs_rq);
if (se->sched_delayed) {
dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
SCHED_WARN_ON(se->sched_delayed);
SCHED_WARN_ON(se->on_rq);
return NULL;
}

rq->cfs.nr_running includes the number of delay dequeued tasks which aren't
really runnable, so it seems like balance_fair() saying yes and
pick_next_entity() then hitting a delayed task. Maybe the solution is
tracking the number of delayed ones and subtracting that from nr_running?
I'm trying that but can't get the delayed count straight for some reason.

Thanks.

--
tejun