Re: [PATCH] sched/fair: fix runnable_avg for throttled cfs

From: Dietmar Eggemann
Date: Thu Feb 27 2020 - 10:15:43 EST


On 27.02.20 13:12, Vincent Guittot wrote:
> On Thu, 27 Feb 2020 at 14:10, Tao Zhou <zhout@xxxxxxxxxxx> wrote:
>>
>> Hi Dietmar,
>>
>> On Thu, Feb 27, 2020 at 11:20:05AM +0000, Dietmar Eggemann wrote:
>>> On 26.02.20 21:01, Vincent Guittot wrote:
>>>> On Wed, 26 Feb 2020 at 20:04, <bsegall@xxxxxxxxxx> wrote:
>>>>>
>>>>> Vincent Guittot <vincent.guittot@xxxxxxxxxx> writes:
>>>>>
>>>>>> When a cfs_rq is throttled, its group entity is dequeued and its running
>>>>>> tasks are removed. We must update runnable_avg with current h_nr_running
>>>>>> and update group_se->runnable_weight with new h_nr_running at each level
>>>
>>> ^^^
>>>
>>> Shouldn't his be 'curren' rather 'new' h_nr_running for
>>> group_se->runnable_weight? IMHO, you want to cache the current value
>>> before you add/subtract task_delta.
>>
>> /me think Vincent is right. h_nr_running is updated in the previous
>> level or out. The next level will use current h_nr_running to update
>> runnable_avg and use the new group cfs_rq's h_nr_running which was
>> updated in the previous level or out to update se runnable_weight.

Ah OK, 'old' as in 'old' cached value se->runnable_weight and 'new' as
the 'new' se->runnable_weight which gets updated *after* update_load_avg
and before +/- task_delta.


So when we throttle e.g. /tg1/tg11

previous level is: /tg1/tg11

next level: /tg1


loop for /tg1:

for_each_sched_entity(se)
cfs_rq = cfs_rq_of(se);

update_load_avg(cfs_rq, se ...) <-- uses 'old' se->runnable_weight

se->runnable_weight = se->my_q->h_nr_running <-- 'new' value
(updated in previous
level, group cfs_rq)

[...]