Re: [RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list

From: Vincent Guittot
Date: Thu May 26 2016 - 05:55:57 EST

On 25 May 2016 at 19:40, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
> Hi Vincent,
> On 24/05/16 10:55, Vincent Guittot wrote:
>> Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that
>> a child will always called before its parent.
>> The hierarchical order in shares update list has been introduced by
>> commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")
>> With the current implementation a child can be still put after its parent.
>> Lets take the example of
>> root
>> \
>> b
>> /\
>> c d*
>> |
>> e*
>> with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list
>> looks like: head -> c -> b -> root -> tail
>> The branch d -> e will be added the first time that they are enqueued,
>> starting with e then d.
>> When e is added, its parents is not already on the list so e is put at the
>> tail : head -> c -> b -> root -> e -> tail
>> Then, d is added at the head because its parent is already on the list:
>> head -> d -> c -> b -> root -> e -> tail
>> e is not placed at the right position and will be called the last whereas
>> it should be called at the beginning.
>> Because it follows the bottom-up enqueue sequence, we are sure that we
>> will finished to add either a cfs_rq without parent or a cfs_rq with a parent
>> that is already on the list. We can use this event to detect when we have
>> finished to add a new branch. For the others, whose parents are not already
>> added, we have to ensure that they will be added after their children that
>> have just been inserted the steps before, and after any potential parents that
>> are already in the list. The easiest way is to put the cfs_rq just after the
>> last inserted one and to keep track of it untl the branch is fully added.
>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> So the use case for this would be you create a multi-level task group
> hierarchy on a cpu (e.g. tg_css_id=2,4,6 on cpu=1) and let a task run in
> the highest level task group (tg_css_id=6).
> list_add_leaf_cfs_rq() call:
> ...
> enqueue_task_fair: cpu=1 tg_css_id=6 cfs_rq=0xffffffc97500f700 on_list=0
> enqueue_task_fair: cpu=1 tg_css_id=4 cfs_rq=0xffffffc975175900 on_list=0
> enqueue_task_fair: cpu=1 tg_css_id=2 cfs_rq=0xffffffc975866d00 on_list=0
> enqueue_task_fair: cpu=1 tg_css_id=1 cfs_rq=0xffffffc97fec44e8 on_list=1
> ...
> In this case, the for_each_leaf_cfs_rq() in update_blocked_averages()
> iterates in the tg_css_id=2,1,6,4 order:
> ...
> update_blocked_averages: tg_css_id=2 cfs_rq=0xffffffc975866d00 on_list=1
> update_blocked_averages: tg_css_id=1 cfs_rq=0xffffffc97fec44e8 on_list=1
> update_blocked_averages: tg_css_id=6 cfs_rq=0xffffffc97500f700 on_list=1
> update_blocked_averages: tg_css_id=4 cfs_rq=0xffffffc975175900 on_list=1
> ...
> which I guess breaks the update_tg_cfs_util() call you introduced in
> update_blocked_averages() in '[RFC PATCH v2] sched: reflect sched_entity
> movement into task_group's utilization'. IMHO, otherwise
> update_blocked_averages() can deal with the list not being ordered
> tg_css_id=6,4,2,1.
> The use of for_each_leaf_cfs_rq() in update_shares() is gone. Do the
> remaining call sites (update_runtime_enabled(),
> unthrottle_offline_cfs_rqs() require any ordering?

I can see one potential issue with unthrottle_offline_cfs_rqs() which
goes to the hierarchy when unthrottling.
Otherwise i don't see any other dependency with ordering the cfs_rq
than the patch i have sent for task group utilization
The other point is this patch align the code with the comment

> [...]