Re: [PATCH 00/24] Complete EEVDF

From: K Prateek Nayak
Date: Sun Nov 10 2024 - 23:08:09 EST

Hello Sam,

On 11/9/2024 4:47 AM, Samuel Wu wrote:
On Thu, Nov 7, 2024 at 11:08 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:

On Wed, Nov 6, 2024 at 4:07 AM Luis Machado <luis.machado@xxxxxxx> wrote:


On 11/6/24 11:09, Peter Zijlstra wrote:
On Wed, Nov 06, 2024 at 11:49:00AM +0530, K Prateek Nayak wrote:

Since delayed entities are still on the runqueue, they can affect PELT
calculation. Vincent and Dietmar have both noted this and Peter posted
in response but it was pulled out since Luis reported observing -ve
values for h_nr_delayed on his setup. A lot has been fixed around
delayed dequeue since and I wonder if now would be the right time to
re-attempt h_nr_delayed tracking.

Yeah, it's something I meant to get back to. I think the patch as posted
was actually right and it didn't work for Luis because of some other,
since fixed issue.

But I might be misremembering things. I'll get to it eventually :/

Sorry for the late reply, I got sidetracked on something else.

There have been a few power regressions (based on our Pixel6-based testing) due
to the delayed-dequeue series.

The main one drove the frequencies up due to an imbalance in the uclamp inc/dec
handling. That has since been fixed by "[PATCH 10/24] sched/uclamg: Handle delayed dequeue". [1]

The bug also made it so disabling DELAY_DEQUEUE at runtime didn't fix things, because the
imbalance/stale state would be perpetuated. Disabling DELAY_DEQUEUE before boot did fix things.

So power use was brought down by the above fix, but some issues still remained, like the
accounting issues with h_nr_running and not taking sched_delayed tasks into account.

Dietmar addressed some of it with "kernel/sched: Fix util_est accounting for DELAY_DEQUEUE". [2]

Peter sent another patch to add accounting for sched_delayed tasks [3]. Though the patch was
mostly correct, under some circumstances [4] we spotted imbalances in the sched_delayed
accounting that slowly drove frequencies up again.

If I recall correctly, Peter has pulled that particular patch from the tree, but we should
definitely revisit it with a proper fix for the imbalance. Suggestion in [5].


Thanks for the replies. We are trying to disable DELAY_DEQUEUE and
recollect the data to see if that's the cause. We'll get back to this
thread once we have some data.


The test data is back to pre-EEVDF state with DELAY_DEQUEUE disabled.

Same test example from before, when thread is affined to the big cluster:
| Data | Enabled | Disabled |
| 5th percentile | 96 | 143 |
| Median | 144 | 147 |
| Mean | 134 | 147 |
| 95th percentile | 150 | 150 |

What are the next steps to bring this behavior back? Will DELAY_DEQUEUE always
be enabled by default and/or is there a fix coming for 6.12?

DELAY_DEQUEUE should be enabled by default from v6.12 but there are a
few fixes for the same in-flight. Could try running with the changes
from [1] and [2] and see if you could reproduce the behavior and if
you can, is it equally bad?

Both changes apply cleanly for me on top of current sched/core

at commit fe9beaaa802d ("sched: No PREEMPT_RT=y for all{yes,mod}config")
when applied in that order.



Thanks and Regards,