Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

From: Luis Machado
Date: Thu May 23 2024 - 05:07:06 EST


Peter,

On 5/23/24 09:45, Peter Zijlstra wrote:
> On Mon, Apr 29, 2024 at 03:33:04PM +0100, Luis Machado wrote:
>
>> (2) m6.6-eevdf-complete: m6.6-stock plus this series.
>> (3) m6.6-eevdf-complete-no-delay-dequeue: (2) + NO_DELAY_DEQUEUE
>
>> +------------+------------------------------------------------------+-----------+
>> | cluster | tag | perc_diff |
>> +------------+------------------------------------------------------+-----------+
>> | CPU | m6.6-stock | 0.0% |
>> | CPU-Big | m6.6-stock | 0.0% |
>> | CPU-Little | m6.6-stock | 0.0% |
>> | CPU-Mid | m6.6-stock | 0.0% |
>> | GPU | m6.6-stock | 0.0% |
>> | Total | m6.6-stock | 0.0% |
>
>> | CPU | m6.6-eevdf-complete-no-delay-dequeue | 117.77% |
>> | CPU-Big | m6.6-eevdf-complete-no-delay-dequeue | 113.79% |
>> | CPU-Little | m6.6-eevdf-complete-no-delay-dequeue | 97.47% |
>> | CPU-Mid | m6.6-eevdf-complete-no-delay-dequeue | 189.0% |
>> | GPU | m6.6-eevdf-complete-no-delay-dequeue | -6.74% |
>> | Total | m6.6-eevdf-complete-no-delay-dequeue | 103.84% |
>
> This one is still flummoxing me. I've gone over the patch a few times on
> different days and I'm not seeing it. Without DELAY_DEQUEUE it should
> behave as before.
>
> Let me try and split this patch up into smaller parts such that you can
> try and bisect this.
>

Same situation on my end. I've been chasing this for some time and I don't fully
understand why things go off the rails energy-wise as soon as DELAY_DEQUEUE is
enabled, now that the load_avg accounting red herring is gone.

I do have one additional piece of information though. Hopefully it will be useful.

Booting the kernel with NO_DELAY_DEQUEUE (default to false), things work fine. Then
if I switch to DELAY_DEQUEUE at runtime, things start using a lot more power.

The interesting bit is if I switch to NO_DELAY_DEQUEUE again at runtime, things don't
go back to normal. Rather they stay the same, using a lot more energy.

I wonder if we're leaving some unbalanced state somewhere while DELAY_DEQUEUE is on,
something that is signalling we have more load/utilization than we actually do.

The PELT signals look reasonable from what I can see. We don't seem to be boosting
frequencies, but we're running things mostly on big cores with DELAY_DEQUEUE on.

I'll keep investigating this. Please let me know if you need some additional data or
testing and I can get that going.