Re: [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups
From: Peter Zijlstra
Date: Thu May 07 2026 - 06:53:55 EST
On Tue, May 05, 2026 at 09:56:58AM -1000, Tejun Heo wrote:
> Hello,
>
> Some high level comments:
>
> - Please align it with existing cgroup2 interface files. See cpu.max. This
> can be e.g. cpu.rt.max without about the same semantics.
>
> - cgroup2 enforces that internal cgroups w/ controllers enabled cannot have
> threads in them. No need to enforce that separately.
Looking at cpu_period_quota_parse() this thing takes two u64 values for:
{runtime, period} but allows runtime to be the string "max".
I think we'd want an optional extension to that and allow 3 values for:
{runtime, period, deadline}, where if the deadline is not given, it will
be the same as period.
In previous versions there was also an option to specify a cpumask,
getting rid of that is one of the reasons I suggested making this thing
a cgroup-v2 thing, then we can use the cpuset controller's effective
mask.
> - However, the cpu controller is a threaded controller which means that it
> can have threaded sub-hierarchy where the no-internal-process rule doesn't
> apply. This was created explicitly for cpu controller. The proposed change
> blocks it effectively forcing cpu controller into regular domain
> controller behavior subject to no-internal-process rule. Note these are
> enforced at controller granularity and this means that users who use the
> threaded mode will be forced to pick between the two.
Right... this then means we need two controls, one to do hierarchical
bandwidth distribution, and one to assign bandwidth to the internal
group -- which is then subject to its own bandwidth distribution
constraint.
This might be a little confusing, but there is no way around that
AFAICT.
> - This has the same problem with cgroup1's rt cgroup sched support where
> there is no way to have a permissive default configuration, which means
> that users who don't really care about distributing rt shares
> hierarchically would get blocked from running rt processes by default,
> which basically forces distros to disable rt cgroup sched support. This is
> not new but it'd be a shame to put in all the work and the end result is
> that most people don't even have access to the feature.
Right, but cgroup-v2 allows enabling/disabling specific controllers for
a (sub)-hierarchy, right? So if the controller is not enabled (by
default), it will fall back to putting the tasks in whatever parent does
have it on, and by default the root group would have and would accept
tasks.
Additionally, I think we want a flag to allow non-priv tasks to use RT
inside the controller -- after all, these tasks would be subject to
strict bandwidth controls and cannot burn the system like unbounded/root
FIFO tasks can.
Does that all sound workable?