Re: [PATCH v6 2/2] cpuset: Add cpuset.sched_load_balance to v2

From: Juri Lelli
Date: Mon Mar 26 2018 - 08:47:23 EST


On 23/03/18 14:44, Waiman Long wrote:
> On 03/23/2018 03:59 AM, Juri Lelli wrote:

[...]

> > OK, thanks for confirming. Can you tell again however why do you think
> > we need to remove sched_load_balance from root level? Won't we end up
> > having tasks put on isolated sets?
>
> The root cgroup is special that it owns all the resources in the system.
> We generally don't want restriction be put on the root cgroup. A child
> cgroup has to be created to have constraints put on it. In fact, most of
> the controller files don't show up in the v2 cgroup root at all.
>
> An isolated cgroup has to be put under root, e.g.
>
> Root
> / \
> isolated balanced
>
> >
> > Also, I guess children groups with more than one CPU will need to be
> > able to load balance across their CPUs, no matter what their parent
> > group does?
>
> The purpose of an isolated cpuset is to have a dedicated set of CPUs to
> be used by a certain application that makes its own scheduling decision
> by placing tasks explicitly on specific CPUs. It just doesn't make sense
> to have a CPU in an isolated cpuset to participated in load balancing in
> another cpuset. If one want load balancing in a child cpuset, the parent
> cpuset should have load balancing turned on as well.

Isolated with CPUs overlapping some other cpuset makes little sense, I
agree. What I have in mind however is an isolated set of CPUs that don't
overlap with any other cpuset (as your balanced set above). In this case
I think it makes sense to let the sys admin decide if "automatic" load
balancing has to be performed (by the scheduler) or no load balacing at
all has to take place?

Further extending your example:

Root [0-3]
/ \
group1 [0-1] group2[2-3]

Why should we prevent load balancing to be disabled at root level (so
that for example tasks still residing in root group are not freely
migrated around, potentially disturbing both sub-groups)?

Then one can decide that group1 is a "userspace managed" group (no load
balancing takes place) and group2 is balanced by the scheduler.

And this is not DEADLINE specific, IMHO.

> As I look into the code, it seems like root domain is probably somewhat
> associated with cpu_exclusive only. Whether sched_load_balance is set
> doesn't really matter. I will need to look further on the conditions
> where a new root domain is created.

I checked again myself (sched domains code is always a maze :) and I
believe that sched_load_balance flag indeed controls domains (sched and
root) creation and configuration . Changing the flag triggers potential
rebuild and separed sched/root domains are generated if subgroups have
non overlapping cpumasks. cpu_exclusive only enforces this latter
condition.

Best,

- Juri