Re: [PATCH] sched/eevdf: Toggle eligibility through sched_feat

From: Tor Vic
Date: Mon Oct 16 2023 - 09:34:40 EST




On 10/15/23 12:44, Peter Zijlstra wrote:
On Thu, Oct 12, 2023 at 10:02:13PM -0500, Youssef Esmat wrote:
Interactive workloads see performance gains by disabling eligibility
checks (EEVDF->EVDF). Disabling the checks reduces the number of
context switches and delays less important work (higher deadlines/nice
values) in favor of more important work (lower deadlines/nice values).

That said, that can add large latencies for some work loads and as the
default is eligibility on, but allowing it to be turned off when
beneficial.

Signed-off-by: Youssef Esmat <youssefesmat@xxxxxxxxxxxx>
Link: https://lore.kernel.org/lkml/CA+q576MS0-MV1Oy-eecvmYpvNT3tqxD8syzrpxQ-Zk310hvRbw@xxxxxxxxxxxxxx/
---
kernel/sched/fair.c | 3 +++
kernel/sched/features.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a751e552f253..16106da5a354 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -728,6 +728,9 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
s64 avg = cfs_rq->avg_vruntime;
long load = cfs_rq->avg_load;
+ if (!sched_feat(ENFORCE_ELIGIBILITY))
+ return 1;
+
if (curr && curr->on_rq) {
unsigned long weight = scale_load_down(curr->load.weight);

Right.. could you pretty please try:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/eevdf

as of yesterday or so.

It defaults to (EEVDF relevant features):

SCHED_FEAT(PLACE_LAG, true)
SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
SCHED_FEAT(PREEMPT_SHORT, true)
SCHED_FEAT(PLACE_SLEEPER, false)
SCHED_FEAT(GENTLE_SLEEPER, true)
SCHED_FEAT(EVDF, false)
SCHED_FEAT(DELAY_DEQUEUE, true)
SCHED_FEAT(GENTLE_DELAY, true)

If that doesn't do well enough, could you please try, in order of
preference:

2) NO_GENTLE_DELAY
3) NO_DELAY_DEQUEUE, PLACE_SLEEPER
4) NO_DELAY_DEQUEUE, PLACE_SLEEPER, NO_GENTLE_SLEEPER

I'm very interested in this scheduler stuff, but I know nothing about the code.

Still, I ran some very quick benchmarks on a dual-core Skylake laptop running 6.6-rc6.
Base slice is 5 ms.

1) Without the recent patches from Peter's tree
2) With patches, default features
3) With patches, NO_GENTLE_DELAY
4) With patches, NO_DELAY_DEQUEUE + PLACE_SLEEPER
5) With patches, like 4) + NO_GENTLE_SLEEPER
6) With patches, like 5) + EVDF

$ perf stat -r 7 -e cs,migrations,cache-misses,branch-misses -- perf bench sched messaging -g 20 -l 1000 -p

test | seconds | cs | migrations | cache miss | branch miss |
------|---------|------|------------|------------|-------------|
1) | 2,90 | 192K | 6,7K | 99M | 60M |
2) | 2,97 | 226K | 6,9K | 102M | 61M |
3) | 3,00 | 247K | 6,9K | 108M | 62M |
4) | 2,92 | 182K | 7,2K | 101M | 60M |
5) | 2,94 | 203K | 6,8K | 101M | 60M |
6) | 2,79 | 84K | 6,4K | 94M | 57M |


$ stress-ng --bsearch 2 --matrix 2 --matrix-method prod --timeout 30 --metrics-brief [results in bogo ops/s]

test | bsearch | matrix |
------|---------|--------|
1) | 392 | 588 |
2) | 512 | 688 |
3) | 512 | 663 |
4) | 512 | 688 |
5) | 511 | 686 |
6) | 510 | 655 |

--

I don't know if this info is useful enough for you scheduler people, but I hope it helps.

Cheers,
Tor


I really don't like the EVDF option, and I think you'll end up
regretting using it sooner rather than later, just to make this one
benchmark you have happy.

I'm hoping the default is enough, but otherwise any of the above should
be a *much* better scheduler.

Also, bonus points if you can create us a stand alone benchmark that
captures your metric (al-la facebook's schbench) without the whole
chrome nonsense, that'd be epic.