Re: [PATCH 0/9] sched: Core scheduling interfaces

From: Peter Zijlstra
Date: Wed Apr 07 2021 - 14:35:36 EST


On Wed, Apr 07, 2021 at 06:50:32PM +0200, Michal Koutný wrote:
> Hello.
>
> IIUC, the premise is that the tasks that have different cookies imply
> they would never share a core.

Correct.

> On Thu, Apr 01, 2021 at 03:10:12PM +0200, Peter Zijlstra wrote:
> > The cgroup interface now uses a 'core_sched' file, which still takes 0,1. It is
> > however changed such that you can have nested tags. The for any given task, the
> > first parent with a cookie is the effective one. The rationale is that this way
> > you can delegate subtrees and still allow them some control over grouping.
>
> Given the existence of prctl and clone APIs, I don't see the reason to
> have a separate cgroup-bound interface too (as argued by Tejun).

IMO as long as cgroups have that tasks file, you get to support people
using it. That means that tasks joining your cgroup need to 'inherit'
cgroup properties.

That's not something covered by either prctl or clone.

> The potential speciality is the possibility to re-tag whole groups of
> processes at runtime (but the listed use cases [1] don't require that
> and it's not such a good idea given its asynchronicity).

That seems to be the implication of having that tasks file. Tasks can
join a cgroup, so you get to deal with that.

You can't just say, don't do that then.

> Also, I would find useful some more explanation how the hierarchical
> behavior is supposed to work. In my understanding the task is either
> allowed to request its own isolation or not. The cgroups could be used
> to restrict this privilege, however, that doesn't seem to be the case
> here.

Given something like:

R
/ \
A B
/ \
C D

B group can set core_sched=1 and then all its (and its decendants) tasks
get to have the same (group) cookie and cannot share with others.

If however B is a delegate and has a subgroup D that is security
sensitive and must not share core resources with the rest of B, then it
can also set D.core_sched=1, such that D (and its decendants) will have
another (group) cookie.

On top of this, say C has a Real-Time tasks, that wants to limit SMT
interference, then it can set a (task/prctl) cookie on itself, such that
it will not share the core with the rest of the tasks of B.


In that scenario the D subtree is a restriction (doesn't share) with the
B subtree.

And all of B is a restriction on all its tasks, including the Real-Time
task that set a task cookie, in that none of them can share with tasks
outside of B (including system tasks which are in R), irrespective of
what they do with their task cookie.