Re: [PATCH 00/10] sched: EEVDF using latency-nice

From: Peter Zijlstra
Date: Tue Mar 07 2023 - 08:47:43 EST

Next message: Alexandre Mergnat: "[PATCH 0/6] Add IOMMU support to MT8365 SoC"
Previous message: Corinna Vinschen: "Re: [PATCH] igb: revert rtnl_lock() that causes deadlock"
In reply to: Vincent Guittot: "Re: [PATCH 00/10] sched: EEVDF using latency-nice"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Mar 07, 2023 at 11:27:37AM +0100, Vincent Guittot wrote:
> On Mon, 6 Mar 2023 at 15:17, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > Hi!
> >
> > Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> > not make more sense, and I did point Vincent at some older patches I had for
> > that (which is here his augmented rbtree thing comes from).
> >
> > Also, since I really dislike the dual tree, I also figured we could dynamically
> > switch between an augmented tree and not (and while I have code for that,
> > that's not included in this posting because with the current results I don't
> > think we actually need this).
> >
> > Anyway, since I'm somewhat under the weather, I spend last week desperately
> > trying to connect a small cluster of neurons in defiance of the snot overlord
> > and bring back the EEVDF patches from the dark crypts where they'd been
> > gathering cobwebs for the past 13 odd years.
>
> I haven't studied your patchset in detail yet but at a 1st glance this
> seems to be a major rework on the cfs task placement and the latency
> is just an add-on on top of moving to the EEVDF scheduling.

It completely reworks the base scheduler, placement, preemption, picking
-- everything. The only thing they have in common is that they're both a
virtual time based scheduler.

The big advantage I see is that EEVDF is fairly well known and studied,
and a much better defined scheduler than WFQ. Specifically, where WFQ is
only well defined in how much time is given to any task (bandwidth), but
says nothing about how that is distributed in time. That is, there is no
native preemption condition/constraint etc. -- all that code we have is
random heuristics mostly.

The WF2Q/EEVDF class of schedulers otoh *do* define all that. There is a
lot less wiggle room as a result. The avg_vruntime / placement stuff I
did is fundamental to how it controls bandwidth distribution and
guarantees the WFQ subset. Specifically, by limiting the pick to that
subset of tasks that has positive lag (owed time), it guarantees this
fairness -- but that means we need a working measure of lag.

Similarly, since the whole 'when' thing is well defined in order to
provide the additional latency goals of these schedulers, placement is
crucial. Things like sleeper bonus is fundamentally incompatible with
latency guarantees -- both affect the 'when'.

Initial EEVDF paper is here:

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564

It contains a few 'mistakes' and oversights, but those should not
matter.

Anyway, I'm still struggling to make complete sense of what you did --
will continue to stare at that.

Next message: Alexandre Mergnat: "[PATCH 0/6] Add IOMMU support to MT8365 SoC"
Previous message: Corinna Vinschen: "Re: [PATCH] igb: revert rtnl_lock() that causes deadlock"
In reply to: Vincent Guittot: "Re: [PATCH 00/10] sched: EEVDF using latency-nice"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]