Re: [PATCH v7 2/5] cpuset: Add cpuset.sched_load_balance to v2

From: Peter Zijlstra
Date: Wed May 02 2018 - 06:24:36 EST


On Thu, Apr 19, 2018 at 09:47:01AM -0400, Waiman Long wrote:
> + cpuset.sched_load_balance
> + A read-write single value file which exists on non-root cgroups.

Uhhm.. it should very much exist in the root group too. Otherwise you
cannot disable it there, which is required to allow smaller groups to
load-balance between themselves.

> + The default is "1" (on), and the other possible value is "0"
> + (off).
> +
> + When it is on, tasks within this cpuset will be load-balanced
> + by the kernel scheduler. Tasks will be moved from CPUs with
> + high load to other CPUs within the same cpuset with less load
> + periodically.
> +
> + When it is off, there will be no load balancing among CPUs on
> + this cgroup. Tasks will stay in the CPUs they are running on
> + and will not be moved to other CPUs.
> +
> + This flag is hierarchical and is inherited by child cpusets. It
> + can be turned off only when the CPUs in this cpuset aren't
> + listed in the cpuset.cpus of other sibling cgroups, and all
> + the child cpusets, if present, have this flag turned off.
> +
> + Once it is off, it cannot be turned back on as long as the
> + parent cgroup still has this flag in the off state.

That too is wrong and broken. You explicitly want to turn it on for
children.

So the idea is that you can have:

R
/ \
A B

With:

R cpus=0-3, load_balance=0
A cpus=0-1, load_balance=1
B cpus=2-3, load_balance=1

Which will allow all tasks in A,B (and its children) to load-balance
across 0-1 or 2-3 resp.

If you don't allow the root group to disable load_balance, it will
always be the largest group and load-balancing will always happen system
wide.