Re: [RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list

From: Vincent Guittot
Date: Thu Jun 02 2016 - 03:43:00 EST


On 1 June 2016 at 19:42, <bsegall@xxxxxxxxxx> wrote:
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>
>> On Tue, May 24, 2016 at 11:55:10AM +0200, Vincent Guittot wrote:
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 218f8e8..6d3fbf2 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -290,15 +290,31 @@ static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
>>> * Ensure we either appear before our parent (if already
>>> * enqueued) or force our parent to appear after us when it is
>>> * enqueued. The fact that we always enqueue bottom-up
>>> + * reduces this to two cases and a specila case for the root
>>
>> 'special'
>>
>>> + * cfs_rq.
>>> */
>>> if (cfs_rq->tg->parent &&
>>> cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->on_list) {
>>> + /* Add the child just before its parent */
>>> + list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
>>> + &(cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->leaf_cfs_rq_list));
>>> + rq_of(cfs_rq)->leaf_alone = &rq_of(cfs_rq)->leaf_cfs_rq_list;
>>> + } else if (!cfs_rq->tg->parent) {
>>> + /*
>>> + * cfs_rq without parent should be
>>> + * at the end of the list
>>> + */
>>> list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
>>> &rq_of(cfs_rq)->leaf_cfs_rq_list);
>>> + rq_of(cfs_rq)->leaf_alone = &rq_of(cfs_rq)->leaf_cfs_rq_list;
>>> + } else {
>>> + /*
>>> + * Our parent has not already been added so make sure
>>> + * that it will be put after us
>>> + */
>>> + list_add_rcu(&cfs_rq->leaf_cfs_rq_list,
>>> + rq_of(cfs_rq)->leaf_alone);
>>> + rq_of(cfs_rq)->leaf_alone = &cfs_rq->leaf_cfs_rq_list;
>>> }
>>>
>>> cfs_rq->on_list = 1;
>>
>> Paul, Ben ?
>>
>> This used to be critical for update_shares() (now
>> update_blocked_averages), but IIRC is not critical for that since
>> PELT.
>
> Yeah, given that we no longer update_cfs_shares in that path, it
> shouldn't be as important (unless new code is being added that it will
> be useful for). That said, I honestly don't remember why we don't
> update_cfs_shares, as it could affect the load.weight being used in a
> parent's computation. Paul, do you remember? Was it just too expensive
> and less necessary given the other improvements?
>
>>
>> I find its more readable with like so..
>>
>>
>> Also; I feel the comments can use more love; my head hurts ;-)
>
> Yeah
>
>
> leaf_alone here is basically a temporary for the duration of an
> enqueue_task_fair call, yes? A name suggesting that might be useful, or

yes it is

> else a comment mentioning that one of the first two cases will always
> clear the otherwise unsafe reference before it can be a problem.

ok, i will add a comment

>
> I think this also only barely works with throttling: even if the tg as a
> whole is out of runtime, an individual cfs_rq can't be throttled until
> just one line after list_add_cfs_rq, and we never list_del until cgroup
> destruction. A throttled !on_list cfs_rq would cause us to never reset
> leaf_alone, but I don't think that can quite happen.

yes, i don't see how this can happen too

>
>>
>> ---
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -286,35 +286,38 @@ static inline struct cfs_rq *group_cfs_r
>> static inline void list_add_leaf_cfs_rq(struct cfs_rq *cfs_rq)
>> {
>> if (!cfs_rq->on_list) {
>> + struct rq *rq = rq_of(cfs_rq);
>> + int cpu = cpu_of(rq);
>> +
>> /*
>> * Ensure we either appear before our parent (if already
>> * enqueued) or force our parent to appear after us when it is
>> * enqueued. The fact that we always enqueue bottom-up
>> - * reduces this to two cases and a specila case for the root
>> + * reduces this to two cases and a special case for the root
>> * cfs_rq.
>> */
>> if (cfs_rq->tg->parent &&
>> - cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->on_list) {
>> + cfs_rq->tg->parent->cfs_rq[cpu]->on_list) {
>> /* Add the child just before its parent */
>> list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
>> - &(cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->leaf_cfs_rq_list));
>> - rq_of(cfs_rq)->leaf_alone = &rq_of(cfs_rq)->leaf_cfs_rq_list;
>> + &(cfs_rq->tg->parent->cfs_rq[cpu]->leaf_cfs_rq_list));
>> + rq->leaf_alone = &rq->leaf_cfs_rq_list;
>> } else if (!cfs_rq->tg->parent) {
>> /*
>> * cfs_rq without parent should be
>> * at the end of the list
>> */
>> list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
>> - &rq_of(cfs_rq)->leaf_cfs_rq_list);
>> - rq_of(cfs_rq)->leaf_alone = &rq_of(cfs_rq)->leaf_cfs_rq_list;
>> + &rq->leaf_cfs_rq_list);
>> + rq->leaf_alone = &rq->leaf_cfs_rq_list;
>> } else {
>> /*
>> * Our parent has not already been added so make sure
>> * that it will be put after us
>> */
>> list_add_rcu(&cfs_rq->leaf_cfs_rq_list,
>> - rq_of(cfs_rq)->leaf_alone);
>> - rq_of(cfs_rq)->leaf_alone = &cfs_rq->leaf_cfs_rq_list;
>> + rq->leaf_alone);
>> + rq->leaf_alone = &cfs_rq->leaf_cfs_rq_list;
>> }
>>
>> cfs_rq->on_list = 1;