Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

From: Peter Zijlstra
Date: Tue Dec 13 2022 - 06:03:13 EST


On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote:

> > But this.. afaict that means that:
> >
> > - the whole EXT thing is incompatible with SCHED_CORE
>
> Can you expand on why this would be? I didn't test against SCHED_CORE, so am
> sure things might be broken but can't think of a reason why it'd be
> fundamentally incompatible.

For starters, SCHED_CORE doesn't use __pick_next_task() (much). But I
think you're going to have more trouble with prio_less() (which is the
3rd implementation of the scheduling function :/)

> > - the whole EXT thing can be trivially starved by the presence of a
> > single CFS/BATCH/IDLE task.
>
> It's a simliar situation w/ RT vs. CFS, which is resolved via RT having
> starvation avoidance.

That is a horrible situation as is, FIFO/RR are very crap scheduling
policies for a number of reasons but we're stuck with them due to
history and POSIX :-(, that is not something you should justify anything
with.

In fact, it should be an example of what to avoid.

Specifically, FIFO/RR fail at the fundamentals of OS
abstractions -- they provide neither resource distribution nor
isolation.

> Here, the way it's handled is a bit different, SCX has
> a watchdog mechanism implemented in "[PATCH 18/31] sched_ext: Implement
> runnable task stall watchdog", so if SCX tasks hang for whatever reason
> including being starved by CFS, it will get aborted and all tasks will be
> handed back to CFS. IOW, it's treated like any other BPF scheduler errors
> that can lead to stalls and recovered the same way.

That all sounds quite terrible.. :/

When the scheduler isn't available it should be an error to switch a
task to the policy, when there are tasks in the policy, it must not go
away.

The policy itself should never cause policy changes.