Re: [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups

From: luca abeni

Date: Thu May 07 2026 - 12:40:00 EST

Hi,

On Thu, 7 May 2026 17:03:41 +0200
Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:

> On 07/05/26 12:53, Peter Zijlstra wrote:
> > On Tue, May 05, 2026 at 09:56:58AM -1000, Tejun Heo wrote:
>
> ...
>
> > > - However, the cpu controller is a threaded controller which
> > > means that it can have threaded sub-hierarchy where the
> > > no-internal-process rule doesn't apply. This was created
> > > explicitly for cpu controller. The proposed change blocks it
> > > effectively forcing cpu controller into regular domain controller
> > > behavior subject to no-internal-process rule. Note these are
> > > enforced at controller granularity and this means that users who
> > > use the threaded mode will be forced to pick between the two.
> >
> > Right... this then means we need two controls, one to do
> > hierarchical bandwidth distribution, and one to assign bandwidth to
> > the internal group -- which is then subject to its own bandwidth
> > distribution constraint.
> >
> > This might be a little confusing, but there is no way around that
> > AFAICT.
>
> Just to check if I'm following, you are thinking something like below?
>
> groupA/
> cpu.rt.max = "50 50 100" <- 0.5 from root
> cpu.rt.internal = "20 20 100" <- 0.2 from groupA for threads at
> this level
> + threadA <
> + threadB <
> +- group1/
> cpu.rt.max = "30 30 100" <- 0.3 from groupA
> + threadC
>
> And we still keep it flat, so 2 dl-entities (per CPU), one handles
> threads at groupA level and the other threads inside group1?

An alternative idea I was thinking about: we create 2 dl entities (one
for "groupA" and one for "group1"); we set cpu.rt.max for groupA, and
we subtract group1's utilization from it (so, if groupA's cpu.rt.max is
"50 100" and group1's cpu.rt.max is "30 100", groupA is served by a dl
entity (50-30,100)=(20,100) while group1 is served by a dl entity
(30,100)).

Basically, with this idea the "internal" reservation is automatically
computed based on rt.max and on the children cgroups. A possible issue
is that if the children consume all the groupA's utilization the groupA
RT tasks remain with 0 runtime (and never execute).

Luca