About group scheduling for SCHED_DEADLINE

From: Luca Abeni
Date: Sun Oct 09 2016 - 15:40:42 EST

Hi all,

after the SCHED_DEADLINE TODO page
(https://github.com/jlelli/sched-deadline/wiki/TODOs) has been
published, there has been a private exchange of emails about the "group
scheduling (cgroups)" / "hierarchical DEADLINE server for FIFO/RR"
I'd like to start a discussion about this topic, so that the TODO item
can be implemented in a way that is agreed by everyone.
I add in cc all the people involved in the previous email exchange
about this topic + Andrea, who originally developed a patch
implementing hierarchical SCHED_DEADLINE (see
http://retis.sssup.it/~nino/publication/rtlws14bdm.pdf and cited
papers); I do not know who else to cc, so feel free to forward this
email to the relevant people or to tell me who to add in future emails.

So, I started to think about this, and here are some ideas to start a
1) First of all, we need to decide the software interface. If I
understand correctly (please correct me if I am wrong), cgroups let
you specify a runtime and a period, and this means that the cgroup is
reserved the specified runtime every period on all the cgroup's
CPUs... In other words, it is not possible to reserve different
runtimes/periods on different CPUs. Is this correct? Is this what we
want for hierarchical SCHED_DEADLINE? Or do we want to allow the
possibility to schedule a cgroup with multiple "deadline servers"
having different runtime/period parameters? (the first solution is
easier to implement, the second one offers more degrees of freedom
that might be used to improve the real-time schedulability)
2) Is it ok have only two levels in the scheduling hierarchy (at least
in the first implementation)?
3) If this "hierarchical SCHED_DEADLINE" is implemented using multiple
"deadline servers" (one per cgroup's CPU) to schedule the cgroup's
tasks, should these servers be bound to CPUs, or should they be free
to migrate between the cgroup's CPUs? In the first case, each one of
these deadline servers can be implemented as a sched_dl_entity
structure that can be scheduled only on a specific runqueue. The
second case is (in my understanding) more complex to implement,
because the dl push/pull code uses task structures, so a dl
scheduling entity per server is not enough (unless we modify the
migration code). At least, this is what I understood when looking at
the code.
4) From a more theoretical point of view, it would be good to define
the scheduling model that needs to be implemented (based on something
previously described on some paper, or defining a new model from

Well, I hope this can be a good starting point for a discussion :)