Re: [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask

From: Anshuman Khandual
Date: Wed Feb 01 2017 - 02:31:48 EST


On 01/31/2017 09:30 PM, Mel Gorman wrote:
> On Tue, Jan 31, 2017 at 07:52:37PM +0530, Anshuman Khandual wrote:
>> At present, top_cpuset.mems_allowed is same as node_states[N_MEMORY] and it
>> cannot be changed at the runtime. Maximum possible node_states[N_MEMORY]
>> also gets reflected in top_cpuset.effective_mems interface. It prevents some
>> one from removing or restricting memory placement which will be applicable
>> system wide on a given memory node through cpuset mechanism which might be
>> limiting. This solves the problem by enabling update_nodemask() function to
>> accept changes to top_cpuset.mems_allowed as well. Once changed, it also
>> updates the value of top_cpuset.effective_mems. Updates all it's task's
>> mems_allowed nodemask as well. It calls cpuset_inc() to make sure cpuset
>> is accounted for in the buddy allocator through cpusets_enabled() check.
>>
>
> What's the point of allowing the root cpuset to be restricted?

After an extended period of run time on a system, currently if we have
to run HW diagnostics and dump (which are run out of band) for debug
purpose, we have to stop further allocations to the node. Hot plugging
the memory node out of the kernel will achieve this. But it can also
be made possible by just enabling top_cpuset.memory_migrate and then
restricting all the allocations by removing the node from top_cpuset.
mems_allowed nodemask. This will force all the existing allocations
out of the target node.

More importantly it also extends the cpuset memory restriction feature
to the logical completion without adding any regressions for the
existing use cases. Then why not do this ? Does it add any overhead ?

In the future this feature can also be used to isolate a memory node
from all possible general allocations and at the same time provide an
alternate method for explicit allocation into it (still working on this
part, though have a hack right now). The current RFC series proposes
one such possible use case through the top_cpuset.mems_allowed nodemask.
But in this case it is being restricted during boot as well as after
hotplug of a memory only NUMA node.

If you think currently this does not have a use case to stand on it's
own, then I will carry it along with this patch series as part of the
proposed cpuset based isolation solution (with explicit allocation
access to the isolated node) as described just above.

- Anshuman