Re: [PATCH 0/9] sched: Core scheduling interfaces

From: Tejun Heo
Date: Tue Apr 06 2021 - 12:08:55 EST


Hello,

On Tue, Apr 06, 2021 at 05:32:04PM +0200, Peter Zijlstra wrote:
> > I find it difficult to like the proposed interface from the name (the term
> > "core" is really confusing given how the word tends to be used internally)
> > to the semantics (it isn't like anything else) and even the functionality
> > (we're gonna have fixed processors at some point, right?).
>
> Core is the topological name for the thing that hosts the SMT threads.
> Can't really help that.

I find the name pretty unfortunate given how overloaded the term is
generally and also in kernel but oh well...

> > Here are some preliminary thoughts:
> >
> > * Are both prctl and cgroup based interfaces really necessary? I could be
> > being naive but given that we're (hopefully) working around hardware
> > deficiencies which will go away in time, I think there's a strong case for
> > minimizing at least the interface to the bare minimum.
>
> I'm not one for cgroups much, so I'll let others argue that case, except
> that per systemd and all the other new fangled shit, people seem to use
> cgroups a lot to group tasks. So it makes sense to also expose this
> through cgroups in some form.

All the new fangled things follow a certain usage pattern which makes
aligning parts of process tree with cgroup layout trivial, so when
restrictions can be applied along the process tree like this and there isn't
a particular need for dynamic hierarchical control, there isn't much need
for direct cgroup interface.

> That said; I've had requests from lots of non security folks about this
> feature to help mitigate the SMT interference.
>
> Consider for example Real-Time. If you have an active SMT sibling, the
> CPU performance is much less than it would be when the SMT sibling is
> idle. Therefore, for the benefit of determinism, it would be very nice
> if RT tasks could force-idle their SMT siblings, and voila, this
> interface allows exactly that.
>
> The same is true for other workloads that care about interference.

I see.

> > Given how cgroups are set up (membership operations happening only for
> > seeding, especially with the new clone interface), it isn't too difficult
> > to synchronize process tree and cgroup hierarchy where it matters - ie.
> > given the right per-process level interface, restricting configuration for
> > a cgroup sub-hierarchy may not need any cgroup involvement at all. This
> > also nicely gets rid of the interaction between prctl and cgroup bits.
>
> I've no idea what you mean :/ The way I use cgroups (when I have to, for
> testing) is to echo the pid into /cgroup/foo/tasks. No clone or anything
> involved.

The usage pattern is creating a new cgroup, seeding it with a process
(either writing to tasks or using CLONE_INTO_CGROUP) and let it continue
only on that sub-hierarchy, so cgroup hierarchy usually partially overlays
process trees.

> None of my test machines come up with cgroupfs mounted, and any and all
> cgroup setup is under my control.
>
> > * If we *have* to have cgroup interface, I wonder whether this would fit a
> > lot better as a part of cpuset. If you squint just right, this can be
> > viewed as some dynamic form of cpuset. Implementation-wise, it probably
> > won't integrate with the rest but I think the feature will be less jarring
> > as a part of cpuset, which already is a bit of kitchensink anyway.
>
> Not sure I agree, we do not change the affinity of things, we only
> control who's allowed to run concurrently on SMT siblings. There could
> be a cpuset partition split between the siblings and it would still work
> fine.

I see. Yeah, if we really need it, I'm not sure it fits in cgroup interface
proper. As I wrote elsewhere, these things are usually implemented on the
originating subsystem interface with cgroup ID as a parameter.

Thanks.

--
tejun