Re: [PATCH 13/15] sched,fair: propagate sum_exec_runtime up the hierarchy

From: Dietmar Eggemann
Date: Thu Aug 29 2019 - 13:20:53 EST

On 28/08/2019 15:14, Rik van Riel wrote:
> On Wed, 2019-08-28 at 09:51 +0200, Dietmar Eggemann wrote:
>> On 22/08/2019 04:17, Rik van Riel wrote:
>>> Now that enqueue_task_fair and dequeue_task_fair no longer iterate
>>> up
>>> the hierarchy all the time, a method to lazily propagate
>>> sum_exec_runtime
>>> up the hierarchy is necessary.
>>> Once a tick, propagate the newly accumulated exec_runtime up the
>>> hierarchy,
>>> and feed it into CFS bandwidth control.
>>> Remove the pointless call to account_cfs_rq_runtime from
>>> update_curr,
>>> which is always called with a root cfs_rq.
>> But what about the call to account_cfs_rq_runtime() in
>> set_curr_task_fair()? Here you always call it with the root cfs_rq.
>> Shouldn't this be called also in a loop over all se's until !se-
>>> parent
>> (like in propagate_exec_runtime() further below).
> I believe that call should be only on the cgroup
> cfs_rq, with account_cfs_rq_runtime figuring out
> whether more runtime needs to be obtained from
> further up in the hierarchy.

So like this?

@@ -10248,7 +10248,8 @@ static void set_curr_task_fair(struct rq *rq)

set_next_entity(cfs_rq, se);
/* ensure bandwidth has been allocated on our new cfs_rq */
- account_cfs_rq_runtime(cfs_rq, 0);
+ if (task_se_in_cgroup(se))
+ account_cfs_rq_runtime(group_cfs_rq_of_parent(se), 0);

I fail to understand the second part of your sentence, and
how is this related to the code in propagate_exec_runtime():

for_each_sched_entity(se) {

propagate_exec_runtime() {

if (parent)
account_cfs_rq_runtime(cfs_rq, diff);

> By default we should probably work under the assumption
> that account_cfs_rq_runtime() will succeed at the current
> level, and no gymnastics are required to obtain CPU time.

Maybe this all will become clearer when the reworked CFS Bandwidth
support is ready ;-) I see this patch as the first part of it.