Re: [PATCH v4] cpuset: Enable cpuset controller in default hierarchy

From: Waiman Long
Date: Mon Mar 12 2018 - 10:20:40 EST


On 03/10/2018 08:16 AM, Peter Zijlstra wrote:
> On Fri, Mar 09, 2018 at 06:06:29PM -0500, Waiman Long wrote:
>> So you are talking about sched_relax_domain_level and
> That one I wouldn't be sad to see the back of.
>
>> sched_load_balance.
> This one, that's critical. And this is the perfect time to try and fix
> the whole isolcpus issue.
>
> The primary issue is that to make equivalent functionality available
> through cpuset, we need to basically start all tasks outside the root
> group.
>
> The equivalent of isolcpus=xxx is a cgroup setup like:
>
> root
> / \
> system other
>
> Where other has the @xxx cpus and system the remainder and
> root.sched_load_balance = 0.

I saw in the kernel-parameters.txt file that the isolcpus option was
deprecated - use cpusets instead. However, there doesn't seem to have
document on the right way to do it. Of course, we can achieve similar
results with what you have outlined above, but the process is more
complex than just adding another boot command line argument with
isolcpus. So I doubt isolcpus will die anytime soon unless we can make
the alternative as easy to use.

> Back before cgroups (and the new workqueue stuff), we could've started
> everything in the !root group, no worry. But now that doesn't work,
> because a bunch of controllers can't deal with that and everything
> cgroup expects the cgroupfs to be empty on boot.

AFAIK, all the processes belong to the root cgroup on boot. And the root
cgroup is usually special that the controller may not exert any control
for processes in the root cgroup. Many controllers become active for
processes in the child cgroups only. Would you mind elaborating what
doesn't quite work currently?


> It's one of my biggest regrets that I didn't 'fix' this before cgroups
> came along.
>
>> I have not removed any bits. I just haven't exposed
>> them yet. It does seem like these 2 control knobs are useful from the
>> scheduling perspective. Do we also need cpu_exclusive or just the two
>> sched control knobs are enough?
> I always forget if we need exclusive for load_balance to work; I'll
> peruse the document/code.

I think the cpu_exclusive feature can be useful to enforce that CPUs
allocated to the "other" isolated cgroup cannot be used by the processes
under the "system" parent.

I know that there are special code to handle the isolcpus option. How
about changing it to create a exclusive cpuset automatically instead.
Applications that need to run in those isolated CPUs can then use the
standard cgroup process to be moved into the isolated cgroup. For example,

isolcpus=<cpuset-name>,<cpu-id-list>

or

isolcpuset=<cpuset-name>[,cpu:<cpu-id-list>][,mem:<memory-node-list>]

We can then retire the old usage and encourage users to use the cgroup
API to manage it.

Cheers,
Longman