Re: [RFC 00/60] Coscheduling for Linux

From: Frederic Weisbecker
Date: Fri Oct 19 2018 - 10:52:30 EST

On Fri, Oct 19, 2018 at 01:40:03PM +0200, Jan H. Schönherr wrote:
> On 17/10/2018 04.09, Frederic Weisbecker wrote:
> > On Fri, Sep 07, 2018 at 11:39:47PM +0200, Jan H. Schönherr wrote:
> >> C) How does it work?
> >> --------------------
> [...]
> >> For each task-group, the user can select at which level it should be
> >> scheduled. If you set "cpu.scheduled" to "1", coscheduling will typically
> >> happen at core-level on systems with SMT. That is, if one SMT sibling
> >> executes a task from this task group, the other sibling will do so, too. If
> >> no task is available, the SMT sibling will be idle. With "cpu.scheduled"
> >> set to "2" this is extended to the next level, which is typically a whole
> >> socket on many systems. And so on. If you feel, that this does not provide
> >> enough flexibility, you can specify "cosched_split_domains" on the kernel
> >> command line to create more fine-grained scheduling domains for your
> >> system.
> >
> > Have you considered using cpuset to specify the set of CPUs inside which
> > you want to coschedule task groups in? Perhaps that would be more flexible
> > and intuitive to control than this cpu.scheduled value.
> Yes, I did consider cpusets. Though, there are two dimensions to it:
> a) at what fraction of the system tasks shall be coscheduled, and
> b) where these tasks shall execute within the system.
> cpusets would be the obvious answer to the "where". However, in the current
> form they are too inflexible with too much overhead. Suppose, you want to
> coschedule two tasks on SMT siblings of a core. You would be able to
> restrict the tasks to a specific core with a cpuset. But then, it is bound
> to that core, and the load balancer cannot move the group of two tasks to a
> different core.
> Now, it would be possible to "invent" relocatable cpusets to address that
> issue ("I want affinity restricted to a core, I don't care which"), but
> then, the current way how cpuset affinity is enforced doesn't scale for
> making use of it from within the balancer. (The upcoming load balancing
> portion of the coscheduler currently uses a file similar to cpu.scheduled
> to restrict affinity to a load-balancer-controlled subset of the system.)

Oh ok, I understand now. Affinity and node-scope mutual exclusion are
entirely decoupled, I see.

> Using cpusets as the mean to describe which parts of the system are to be
> coscheduled *may* be possible. But if so, it's a long way out. The current
> implementation uses scheduling domains for this, because (a) most
> coscheduling use cases require an alignment to the topology, and (b) it
> integrates really nicely with the load balancer.

So what is the need for cosched_split_domains? What kind of corner case won't
fit into scheduler domains? Can you perhaps spare that part in this patchset
to simplify it somehow? If it happens to be necessary, it can still be added