Re: [PATCH 1/3] sched/fair: Disable runtime_enabled on dying rq

From: bsegall
Date: Mon Jun 23 2014 - 17:05:34 EST


Kirill Tkhai <tkhai@xxxxxxxxx> writes:

> On 23.06.2014 21:29, bsegall@xxxxxxxxxx wrote:
>> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
>>
>>> On Tue, Jun 17, 2014 at 05:24:10PM +0400, Kirill Tkhai wrote:
>>>> @@ -3790,6 +3803,12 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>>>> cfs_rq->runtime_remaining = 1;
>>>> if (cfs_rq_throttled(cfs_rq))
>>>> unthrottle_cfs_rq(cfs_rq);
>>>> +
>>>> + /*
>>>> + * Offline rq is schedulable till cpu is completely disabled
>>>> + * in take_cpu_down(), so we prevent new cfs throttling here.
>>>> + */
>>>> + cfs_rq->runtime_enabled = 0;
>>>
>>> Does it make sense to clear this before calling unthrottle_cfs_rq()?
>>> Just to make sure they're in the right order..
>>
>> I believe that order is irrelevant here - I do not believe that
>> unthrottle_cfs_rq(a) can cause a throttle_cfs_rq(a). In fact, I don't
>> see any code that will check it at all from unthrottle, although I might
>> be missing something. It _can_ cause a throttle_cfs_rq(parent_cfs_rq(a)),
>> but that should be fine as long as for_each_leaf_cfs_rq is sorted
>> correctly.
>
> I think this is correct. We may change the order just for the hope, that
> anybody will work on it in some way in the future, and this person could
> skip this opaque place. Ok, I don't know how is better :)
>
>> That said, migrate_tasks drops rq->lock, and I /think/ another cpu could
>> wake another task onto this cpu, which could then enqueue_throttle its
>> cfs_rq (which previously had no tasks and thus wasn't on
>> leaf_cfs_rq_list). You certainly could have tg_set_bandwidth come in and
>> turn runtime_enabled on.
>
> We mask cpu inactive on CPU_DOWN_PREPARE stage (in sched_cpu_inactive).
> Other cpu is not able to wake a task there after that.
>
> rq is masked offline in cpuset_cpu_inactive() (during the same stage).
> But priority of sched_cpu_inactive() is higher than priority of
> cpuset_cpu_inactive().
>
> CPU_PRI_SCHED_INACTIVE = INT_MIN + 1,
> CPU_PRI_CPUSET_INACTIVE = INT_MIN,
>
> This guarantees that nobody could use dying_cpu even before we start
> unthrottling. Another cpu will see dying_cpu is inactive.
>
> So, it looks like we are free of this problem.

Ah, ok, I haven't looked that hard at hotplug, and wasn't sure of the
ordering there. We still have the tg_set_cfs_bandwidth issue, because
that uses for_each_possible_cpu. However, with the addition of
rq_online_fair, that can probably be changed to for_each_active_cpu, and
then I think we would be fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/