Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

From: Tejun Heo
Date: Mon Dec 12 2022 - 15:04:06 EST


Hello,

On Mon, Dec 12, 2022 at 01:31:11PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 29, 2022 at 10:22:56PM -1000, Tejun Heo wrote:
> > @@ -11242,3 +11268,38 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
> > {
> > trace_sched_update_nr_running_tp(rq, count);
> > }
> > +
> > +#ifdef CONFIG_SCHED_CLASS_EXT
> > +void sched_deq_and_put_task(struct task_struct *p, int queue_flags,
> > + struct sched_enq_and_set_ctx *ctx)
> > +{
> > + struct rq *rq = task_rq(p);
> > +
> > + lockdep_assert_rq_held(rq);
> > +
> > + *ctx = (struct sched_enq_and_set_ctx){
> > + .p = p,
> > + .queue_flags = queue_flags | DEQUEUE_NOCLOCK,
> > + .queued = task_on_rq_queued(p),
> > + .running = task_current(rq, p),
> > + };
> > +
> > + update_rq_clock(rq);
> > + if (ctx->queued)
> > + dequeue_task(rq, p, queue_flags);
> > + if (ctx->running)
> > + put_prev_task(rq, p);
> > +}
> > +
> > +void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx)
> > +{
> > + struct rq *rq = task_rq(ctx->p);
> > +
> > + lockdep_assert_rq_held(rq);
> > +
> > + if (ctx->queued)
> > + enqueue_task(rq, ctx->p, ctx->queue_flags);
> > + if (ctx->running)
> > + set_next_task(rq, ctx->p);
> > +}
> > +#endif /* CONFIG_SCHED_CLASS_EXT */
>
> So no. Like the whole __setscheduler_prio() thing, this doesn't make
> sense outside of the core code, policy/class code should never need to
> do this.

Continuing from the __setscheduler_prio() discussion, the need arises from
the fact that whether a task is on SCX or CFS changes depending on whether
the BPF scheduler is loaded or not - e.g. when the BPF scheduler gets
disabled, all tasks that were on it need to be moved back into CFS. There
are different ways to implement this but it needs to be solved somehow.

> Also: https://lkml.kernel.org/r/20220330162228.GH14330@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Yeah, it'd be nice to encapsulate this sequence. The FOR_CHANGE_GUARD naming
throws me off a bit tho as it's not really a loop. Wouldn't it be better to
call it CHANGE_GUARD_BLOCK or sth?

Thanks.

--
tejun