Re: [PATCH 0/9] sched: Core scheduling interfaces

From: Peter Zijlstra
Date: Thu Apr 08 2021 - 12:49:07 EST


On Thu, Apr 08, 2021 at 03:25:52PM +0200, Michal Koutný wrote:
> On Wed, Apr 07, 2021 at 08:34:24PM +0200, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > IMO as long as cgroups have that tasks file, you get to support people
> > using it. That means that tasks joining your cgroup need to 'inherit'
> > cgroup properties.
> The tasks file is consequence of binding this to cgroups, I'm one step
> back. Why to make "core isolation" a cgroup property?

Yeah, dunno, people asked for it. I'm just proposing an implementation
that, when given the need, seems to make sense and is internally
consistent.

> (I understand this could help "visualize" what the common domains are if
> cgroups were the only API but with prctl the structure can be
> arbitrarily modified anyway.)
>
>
> > Given something like:
> >
> > R
> > / \
> > A B
> > / \
> > C D
> Thanks for the example.
>
> > B group can set core_sched=1 and then all its (and its decendants) tasks
> > get to have the same (group) cookie and cannot share with others.
> The same could be achieved with the first task of group B allocating its
> new cookie which would be inherited in its descednants.

Except then the task can CLEAR its own cookie and escape the constraint.

> > In that scenario the D subtree is a restriction (doesn't share) with the
> > B subtree.
> This implies D's isolation from everything else too, not just B's
> members, no?

Correct. Look at it as a contraint on co-scheduling, you can never,
whatever you do, share an SMT sibling with someone outside your subtree.

> > And all of B is a restriction on all its tasks, including the Real-Time
> > task that set a task cookie, in that none of them can share with tasks
> > outside of B (including system tasks which are in R), irrespective of
> > what they do with their task cookie.
> IIUC, the equivalent restriction could be achieved with the PTRACE-like
> check in the prctl API too (with respectively divided uids).

I'm not sure I understand; if tasks in A and B are of the same user,
then ptrace will not help anything. And per the above, you always have
ptrace on yourself so you can escape your constraint per the above.

> I'm curious whether the cgroup API actually simplifies things that are
> possible with the clone/prctl API or allows anything that wouldn't be
> otherwise possible.

With the cgroup API it is impossible for a task to escape the cgroup
constraint. It can never share a core with anything not in the subtree.

This is not possible with just the task interface.

If this is actually needed I've no clue, IMO all of cgroups is not
needed :-) Clearly other people feel differently about that.


Much of this would go away if CLEAR were not possible I suppose. But
IIRC the idea was to let a task isolate itself temporarily, while doing
some sensitive thing (eg. encrypt an email) but otherwise not be
constrained. But I'm not sure I can remember all the various things
people wanted this crud for :/