Re: [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr

From: Qais Yousef
Date: Wed May 31 2023 - 15:49:31 EST


On 05/31/23 13:58, Peter Zijlstra wrote:
> Hi!
>
> Latest version of the EEVDF [1] patches.
>
> The only real change since last time is the fix for tick-preemption [2], and a
> simple safe-guard for the mixed slice heuristic.
>
> Other than that, I've re-arranged the patches to make EEVDF come first and have
> the latency-nice or slice-attribute patches on top.
>
> Results should not be different from last time around, lots of people ran them
> and found no major performance issues; what was found was better latency and
> smaller variance (probably due to the more stable latency).
>
> I'm hoping we can start queueing this part.
>
> The big question is what additional interface to expose; some people have
> voiced objections to the latency-nice interface, the 'obvious' alternative
> is to directly expose the slice length as a request/hint.

I haven't thought much about this, but..

There are two desired properties form user space that I hope we can achieve
with this new hint:

1. The obvious improve the wake up latencies of some tasks.
2. Hint that there are some tasks that are happy to suffer high latencies.

2 is important because one thing that user space wants to do but we lack
mechanisms to do so is give a hint some tasks are background tasks and can be
shuffled around and preempted more often at the expense of keeping other
tasks more happy.

I'm hoping this + uclamp_max there would be reasonable way to tag these
tasks now so that they consume less power and can be kept 'out of the way' when
necessary.

Slice length seems a bit counter intuitive for these use cases. Maybe expose
the lag? Not sure if this neutral though to allow moving away from EEVDF in the
future if we ever need to. deadline can sound too promising probably.

Also, we had in mind to use these in EAS to avoid packing (though this has to
be re-evaluated if still needed) which is another source of latency. I think
Oracle wanted to use that to control the search depth at load balance IIRC
- not sure if they still want that.

Not sure where we stand from multiple users of this hint now. Should they be
new hints? If so, are we okay with continue to extend sched_setattr() or better
create a proper QoS framework?

As a side note - I started working on some patches to generalize the load
balancer misfit path. Issues we see:

1. latency sensitive tasks ending up on the same CPU can end up suffering. We
need both wake up and load balancer to be aware of this and help spreading.
2. Busy uclamp_max tasks can end up stuck on a wrong core with no ability to
move it around. I still have to see this in practice but it's a concern for
wider deployment (which hasn't happened yet).

I think the misfit path the right way to handle these cases. But I could be
wrong. (or maybe you already take care of this and I just need to better read
the patches). For the wake up side of things I think we need to be careful of
cramming latency sensitive tasks on the same rq if we can avoid it. I didn't
spot that in Vincent patches, neither in yours (which I yet to look better at).

If I may piggy-back a bit more. We seem to reserve a policy for SCHED_ISO,
which AFAICT had an old proposal to better tag interactive tasks. Is this
something waiting for someone to come forward with a proposal for again? I have
to say, the choice of interactive policy looks attractive. I wonder sometimes
if we need to go vertical (new policy/sched_class) instead of horizental - or
maybe both.

Sorry a bit of a brain dump here. HTH.


Cheers

--
Qais Yousef

>
> The very last patch implements this alternative using sched_attr::sched_runtime
> but is untested.
>
> Diffstat for the base patches [1-11]:
>
> include/linux/rbtree_augmented.h | 26 +
> include/linux/sched.h | 7 +-
> kernel/sched/core.c | 2 +
> kernel/sched/debug.c | 48 +-
> kernel/sched/fair.c | 1105 ++++++++++++++++++--------------------
> kernel/sched/features.h | 24 +-
> kernel/sched/sched.h | 16 +-
> 7 files changed, 587 insertions(+), 641 deletions(-)
>
>
> [1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564
>
> [2] https://lkml.kernel.org/r/20230420150537.GC4253%40hirez.programming.kicks-ass.net
>