Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

From: Luis Machado
Date: Wed Sep 11 2024 - 04:36:32 EST


On 9/10/24 15:05, Peter Zijlstra wrote:
> On Tue, Sep 10, 2024 at 12:04:11PM +0100, Luis Machado wrote:
>> I gave the above patch a try on our Android workload running on the Pixel 6 with a 6.8-based kernel.
>>
>> First I'd like to confirm that Dietmar's fix that was pushed to tip:sched/core (Fix util_est
>> accounting for DELAY_DEQUEUE) helps bring the frequencies and power use down to more sensible levels.
>>
>> As for the above changes, unfortunately I'm seeing high frequencies and high power usage again. The
>> pattern looks similar to what we observed with the uclamp inc/dec imbalance.
>
> :-(
>
>> I haven't investigated this in depth yet, but I'll go stare at some traces and the code, and hopefully
>> something will ring bells.
>
> So first thing to do is trace h_nr_delayed I suppose, in my own
> (limited) testing that was mostly [0,1] correctly correlating to there
> being a delayed task on the runqueue.
>
> I'm assuming that removing the usage sites restores function?

It does restore function if we remove the usage.

>From an initial look:

cat /sys/kernel/debug/sched/debug | grep -i delay
.h_nr_delayed : -4
.h_nr_delayed : -6
.h_nr_delayed : -1
.h_nr_delayed : -6
.h_nr_delayed : -1
.h_nr_delayed : -1
.h_nr_delayed : -5
.h_nr_delayed : -6

So probably an unexpected decrement or lack of an increment somewhere.