Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

From: Luis Machado
Date: Wed Sep 11 2024 - 05:39:58 EST


On 9/11/24 10:10, Mike Galbraith wrote:
> On Wed, 2024-09-11 at 10:45 +0200, Peter Zijlstra wrote:
>> On Wed, Sep 11, 2024 at 09:35:16AM +0100, Luis Machado wrote:
>>>>
>>>> I'm assuming that removing the usage sites restores function?
>>>
>>> It does restore function if we remove the usage.
>>>
>>> From an initial look:
>>>
>>> cat /sys/kernel/debug/sched/debug | grep -i delay                                                                                                                                                                                                                            
>>>   .h_nr_delayed                  : -4
>>>   .h_nr_delayed                  : -6
>>>   .h_nr_delayed                  : -1
>>>   .h_nr_delayed                  : -6
>>>   .h_nr_delayed                  : -1
>>>   .h_nr_delayed                  : -1
>>>   .h_nr_delayed                  : -5
>>>   .h_nr_delayed                  : -6
>>>
>>> So probably an unexpected decrement or lack of an increment somewhere.
>>
>> Yeah, that's buggered. Ok, I'll go rebase sched/core and take this patch
>> out. I'll see if I can reproduce that.
>
> Hm, would be interesting to know how the heck he's triggering that.
>
> My x86_64 box refuses to produce any such artifacts with anything I've
> tossed at it, including full LTP with enterprise RT and !RT configs,
> both in master and my local SLE15-SP7 branch. Hohum.
>
> -Mike

>From what I can tell, the decrement that makes h_nr_delayed go negative is in
the dequeue_entities path.

First:

if (!task_sleep && !task_delayed)
h_nr_delayed = !!se->sched_delayed;

h_nr_delayed is 1 here.

Then we decrement cfs_rq->h_nr_delayed below:

cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
cfs_rq->h_nr_delayed -= h_nr_delayed;