Re: [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups

From: Peter Zijlstra

Date: Tue May 05 2026 - 11:19:37 EST


On Thu, Apr 30, 2026 at 11:38:24PM +0200, Yuri Andriaccio wrote:
> From: luca abeni <luca.abeni@xxxxxxxxxxxxxxx>
>
> Allow for cgroup hierarchies with more than two levels.
>
> Introduce the concept of live and active groups:
> - A group is live if it is a leaf group or if all its children have zero
> runtime.
> - A live group with non-zero runtime can be used to schedule tasks.
> - An active cgroup is a live group with running tasks.
> - A non-live group cannot be used to run tasks, but it is only used for
> bandwidth accounting, i.e. the sum of its children bandwidth must be
> less than or equal to the bandwidth of the parent. This change allows
> to use cgroups for bandwidth management for different users.
> - While the root cgroup specifies the total allocatable bandwidth of rt
> cgroups, a further accounting is performed to keep track of the live
> bandwidth, i.e. the sum of the bandwidth of live groups. The hierarchy
> invariant states that the live bandwidth must always be less than or
> equal to the total allocatable bw.
>
> Add is_live_sched_group() and sched_group_has_live_siblings() in
> deadline.c. These utility functions are used by dl_init_tg to perform
> updates only when necessary:
> - Only live groups may update the active dl bandwidth of dl entities
> (call to dl_rq_change_utilization), while non-live groups must not use
> servers, and thus must not change the active dl bandwidth.
> - The total bandwidth accounting must be changed to follow the
> live/non-live rules:
> - When disabling (runtime zero) the last child of a group, the parent
> becomes a live group, and so the parent's bw must be accounted back.
> - When enabling (runtime non-zero) the first child, the parent becomes a
> non-live group, and so the parent's bandwidth must be removed.
>
> Update tg_set_rt_bandwidth() to change the runtime of a group to a
> non-zero value only if its parent is inactive, thus forcing it to become
> non-live if it was precedently (it would've already been non-live if a
> sibling cgroup was live). An exception is made for groups which have the
> root cgroup as parent.
>
> Update sched_rt_can_attach() to allow attaching only on live groups.
>
> Update dl_init_tg() to take a task_group pointer and a cpu's id rather
> than passing directly the pointer to the cpu's deadline server. The
> task_group pointer is necessary to check and update the live bandwidth
> accounting.
>
> Co-developed-by: Yuri Andriaccio <yurand2000@xxxxxxxxx>
> Signed-off-by: Yuri Andriaccio <yurand2000@xxxxxxxxx>
> Signed-off-by: luca abeni <luca.abeni@xxxxxxxxxxxxxxx>

This probably wants to have the cgroup folks on Cc (added now) to make
sure the semantics are in line with cgroup-v2 expectations.