Re: [RFC][PATCH 0/8] Use EDF to throttle RT task groups

From: Fabio Checconi
Date: Thu Jul 09 2009 - 09:50:21 EST


> From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Date: Thu, Jul 09, 2009 12:36:13PM +0200
>
> Hi Fabio,
>
> Sorry for the delay, but we already spoke in person about this last
> week, a bit more detail here.
>

No problem, thank you for looking at this.


> On Mon, 2009-06-15 at 20:55 +0200, Fabio Checconi wrote:
> > This patchset introduces a group level EDF scheduler to extend the
> > current throttling mechanism, in order to make it support generic
> > period assignments. With this patch, the rt_runtime and rt_period
> > parameters can be used to specify arbitrary CPU reservations for
> > RT tasks.
>
> > This is an early RFC, I'm interested in having an idea of what people
> > think about this feature, if it's worth working on it, what can be
> > improved in the design, etc.
> >
> > The main design issues involved:
> >
> > - It is no more possible to specify RUNTIME_INF for a task group
> > when throttling is enabled. Rationale: supporting both throttled
> > and unthrottled groups would have required too much extra complexity
> > (I didn't find anything simpler than two parallel runqueues, one for
> > throttled and one for unthrottled groups).
>
> I would think this is more than reasonable to the point that I wonder if
> it was possible to begin with. I was assuming the current bandwidth
> validation stuff would already exclude this possibility.
>

>From the scheduling code it seemed possible, but I think I overlooked
the admission control part, sorry about that.


> > - Since it is not easy to mix tasks and groups on the same scheduler
> > queue (tasks have no deadlines), the bandwidth reserved to the tasks
> > in a group is controlled with two additional cgroup attributes:
> > rt_task_runtime_us and rt_task_period_us. These attrinutes control,
> > within a cgroup, how much bandwidth is reserved to the tasks it
> > contains.
>
> Agreed.
>
> > - Shared resources are still handled using boosting. When a group
> > contains a task inside a critical section it is scheduled according
> > the highest priority among the ones of the tasks it contains.
> > In this way, the same group has two modes: when it is not boosted
> > it is scheduled according to its deadline; when it is boosted, it
> > is scheduled according its priority. Boosted groups are always
> > favored over non-boosted ones.
>
> Yeah, for now this PCP like solution is the best we have. Preferably
> we'll come up with something really smart soon :-)
>
> > - The old priority array is now gone. To use only a single data
> > structure for entities using both priorities and deadlines (due
> > to boosting), the only possible choice was to use an rb-tree;
> > the function used to order the keys takes into account the
> > prioritization described above (boosted tasks, ordered by
> > priority are favored to non-boosted tasks, ordered by increasing
> > deadline).
> >
> > - Given that the various rt_rq's belonging to the same task group
> > are activated independently, there is the need of a timer per
> > each rt_rq.
>
> Like I suggested last week, we could flatten the full hierarchy to a 2
> level one, the top level being a EDF like scheduler which purely
> schedules the 'phantom' task groups as created by the new rt_task_*_us
> parameters.
>

I really like this idea, it would simplify things a lot, with almost
no changes from the scheduling theory POV. I'll give it a try as soon
as possible.


> Once such a task group is selected we use the regular FIFO rules to pick
> a task.
>
> Further, like mentioned, you remove the bandwidth distribution between
> cpus in patch 4. You do this because you schedule the groups using PEDF,
> however I was thinking that when we use GEDF to schedule the groups we
> can use the theory from:
>
> H. Leontyev and J. Anderson, " A Hierarchical Multiprocessor Bandwidth
> Reservation Scheme with Timing Guarantees ", Proceedings of the 20th
> Euromicro Conference on Real-Time Systems, pp. 191-200, July 200
>
> http://www.cs.unc.edu/~anderson/papers/ecrts08c.pdf
>
> This would allow us to lift this constraint.
>
> As to load-balancing. The current scheme of load-balancing the tasks
> utterly ignores the deadline settings and would also need some suitable
> changes. I fear this part will be a little more challenging.
>

I was thinking about doing things gradually: first extend throttling
to handle generic periods, then extend the push-pull logic (I think you
are referring to it with load-balancing) to fully support it, and then
think about global EDF. I think it would be difficult to do all the
things at one time.

I'm not sure that a pure GEDF approach would scale well given the
number of CPUs Linux can support in extreme configurations, maybe a
clustered approach would be more suitable (Anderson's group has published
interesting results both about GEDF scalability and C-EDF). The focus
on bounded tardiness for soft real time tasks in GEDF would also give
a slightly different meaning to the runtime/period assignments done at
the interface level.

So my proposal was in the direction of introducing group EDF scheduling
consistently with the current throttling implementation, to obtain a more
flexible form of assigning CPU reservations. GEDF can be implemented on
top of it, adding a push-pull logic to the PEDF queues; anyway I would
not do this without extended benchmarking in various hw configurations.

(Other schemes to implement GEDF would require major surgery to the
scheduler code, a thing that I wanted to avoid.)

I understand your concern with the removal of the bandwidth distribution
logic, I'll try to reintroduce it in my next post, taking into account
the PEDF schedulability constraints.


> I would think that by using the GEDF and minimal concurrency group
> scheduling we'd get the desired deadline behaviour. After that we'd get
> the task of selecting the FIFO tasks within the dynamic vcpu range
> resulting from the deadline server.
>
> We could simply implement global-fifo to make it work, and then move
> towards adapting the current (!group) load-balancer to work within these
> more dynamic constraints.
>
> Thoughts?

About minimal concurrency group scheduling, I am not sure of how we
would handle CPUs hot insertion/extraction, or how the allocation would
be done efficiently (avoiding bin-packing issues) online inside the kernel.

To adapt the current load-balancer to the choices of the deadline-based
scheduler I was thinking about using a cpupri-like structure per task_group,
but now I'm not able to estimate the resulting overhead...

Do you think that this gradual approach makes sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/