Re: [PATCH v3 0/7] cpuset: implement sane hierarchy behaviors

From: Li Zefan
Date: Thu Jun 13 2013 - 03:05:25 EST


On 2013/6/10 0:03, Tejun Heo wrote:
> Hello, Li.
>
> On Sun, Jun 09, 2013 at 05:14:02PM +0800, Li Zefan wrote:
>> v2 -> v3:
>> Currently some cpuset behaviors are not friendly when cpuset is co-mounted
>> with other cgroup controllers.
>>
>> Now with this patchset if cpuset is mounted with sane_behavior option, it
>> behaves differently:
>>
>> - Tasks will be kept in empty cpusets when hotplug happens and take masks
>> of ancestors with non-empty cpus/mems, instead of being moved to an ancestor.
>>
>> - A task can be moved into an empty cpuset, and again it takes masks of
>> ancestors, so the user can drop a task into a newly created cgroup without
>> having to do anything for it.
>
> I applied 1-2 and the rest of the series also look correct to me and
> seem like a step in the right direction; however, I'm not quite sure
> this is the final interface we want.
>
> * cpus/mems_allowed changing as CPUs go up and down is nasty. There
> should be separation between the configured CPUs and currently
> available CPUs. The current behavior makes sense when coupled with
> the irreversible task migration and all. If we're allowing tasks to
> remain in empty cpusets, it only makes sense to retain and re-apply
> configuration as CPUs come back online.
>
> I find the original behavior of changing configurations as system
> state changes pretty weird especially because it's happening without
> any notification making it pretty difficult to use in any sort of
> automated way - anything which wants to wrap cpuset would have to
> track the configuration and CPU/nodes up/down states separately on
> its own, which is a very easy way to introduce incoherencies.
>
> * validate_change() rejecting updates to config if any of its
> descendants are using some is weird. The config change should be
> enforced in hierarchical manner too. If the parent drops some CPUs,
> it should simply drop those CPUs from the children. The same in the
> other direction, children having configs which aren't fully
> contained inside their parents is fine as long as the effective
> masks are correct.
>

I've just checked other cgroup controllers, and they do behavior the
way you described. So yeah, it makes sense that cpuset behaviors
coherently.

> IOW, validate_change() doesn't really make sense if we're keeping
> tasks in empty cgroups. As CPUs go down and up, we'd keep the
> organization but lose the configuration, which is just weird.
>
> I think what we want is expanding on this patchset so that we have
> separate "configured" and "effective" masks, which are preferably
> exposed to userland and just let the config propagation deal with
> computing the effective masks as CPUs/nodes go down/up and config
> changes. The code actually could be simpler that way although
> there'll be complications due to the old behaviors.
>
> What do you think? If you agree, how should we proceed? We can apply
> these patches and build on top if you prefer.
>

I would prefer those patches are applied first, as the new changes can
be based on this patchset, and the changes should be quite straightforward,
and also I don't have to rebase those patches again.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/