Re: sched_ext: Partial mode priority and fallthrough to EEVDF
From: Tejun Heo
Date: Tue Mar 10 2026 - 14:27:09 EST
Hello, Matt.
On Tue, Mar 10, 2026 at 02:52:13PM +0000, Matt Fleming wrote:
> At Cloudflare we're experimenting with inverting the priority of the
> ext_sched_class and fair_sched_class to allow us to pick SCHED_EXT
> tasks to run before SCHED_NORMAL. This gives us better scheduling
> decisions for those SCHED_EXT tasks where we can embed business logic
> into the BPF program and prevents them being starved by the larger
> number of SCHED_NORMAL tasks under CPU contention. There are a couple
> of reasons we took this route:
>
> 1. Our workloads are heterogeneous and complex and we can't move entire
> systems to SCHED_EXT in one shot. We want to experiment with running
> SCHED_EXT in partial mode as we progressively onboard more and more
> services (we run multiple services on single machines).
>
> 2. There's no way today (AFAIK) to run in "full-mode" and have BPF
> schedulers fallthrough to EEVDF.
>
> In an ideal world, 2 is what we'd want to do. Is anyone else interested
> in this problem or currently working on it? Is there anything coming in
> the future that would make it easier for those of us slowly
> transitioning to SCHED_EXT?
Hmm... I have a bit of hard time following how that's different from partial
mode. If you want the scheduler to decide whether a task should be in SCX or
fair, you can do so from ops.init_task() by asserting p->scx.disallow. If
you mean that you want to switch dynamically on each scheduling event, I
don't think that's a good idea given that each hop would be full sched_class
switch.
As for the ordering between the two, I don't know. How are you using partial
mode? No matter how you order them, the behaviors on pathological cases are
pretty bad and I've been thinking that most would use partial mode to
partition the system so that some CPUs are managed by SCX and others by fair
in which case the ordering doesn't matter that much. If you're mixing the
two classes on the same CPUs, I wonder whether this is something which can
be better dealt with the deadline servers. Andrea, what do you think?
Thanks.
--
tejun