Re: Task group cleanups and optimizations (was: Re: [RFC 00/60] Coscheduling for Linux)

From: Jan H. SchÃnherr
Date: Wed Sep 19 2018 - 05:23:37 EST

On 09/18/2018 04:35 PM, Rik van Riel wrote:
> On Tue, 2018-09-18 at 15:22 +0200, Jan H. SchÃnherr wrote:
> Task priorities in a flat runqueue are relatively straightforward, with
> vruntime scaling just like done for nice levels, but I have to admit
> that throttled groups provide a challenge.

Do you know/have an idea how the flat approach would further skew the
approximations currently done in calc_group_shares()/calc_group_runnable()?

With the nested hierarchy the (shared) task group SE is updated
whenever something changes. With the flat approach, you'd only be
able to update a task SE, when you have to touch the task anyway.
Just from thinking briefly about it, it feels like values would be
out of date for longer periods of time.

>> However, the only efficient way that I can currently think of, is a
>> hybrid model between the "full nesting" that is currently there, and
>> the "no nesting" you were describing above.
>> It would flatten all task groups that do not actively contribute some
>> function, which would be all task groups purely for accounting
>> purposes and those for *unthrottled* CFS hierarchies (and those for
>> coscheduling that contain exactly one SE in a runqueue). The nesting
>> would still be kept for *throttled* hierarchies (and the coscheduling
>> stuff). (And if you wouldn't have mentioned a way to get rid of
>> nesting completely, I would have kept a single level of nesting for
>> accounting purposes as well.)
>> This would allow us to lazily dequeue SEs that have run out of
>> bandwidth when we encounter them, and already enqueue them in the
>> nested task group (whose SE is not enqueued at the moment). That way,
>> it's still a O(1) operation to re-enable all tasks, once runtime is
>> available again. And O(1) to throttle a repeat offender.
> I suspect most systems will have a number of runnable tasks no larger
> than the number of CPUs most of the time.
> That makes "re-enable all the tasks" often equivalent to "re-enable one
> task".
> Can we handle the re-enabling (or waking up!) of one task almost as fast
> as we can without the cpu controller?

We could start by transparently special-casing the "just one SE in a
runqueue" case, where that single SE is enqueued directly into the next
parent, and everything falls back to nesting, the moment a second SE pops

That might allow reaping the benefits for many cases without hurting
other cases. It's also more a gradual conversion of code.