Re: MPOL_BIND on memory only nodes

From: Anshuman Khandual
Date: Wed Oct 12 2016 - 06:39:53 EST

On 10/12/2016 03:13 PM, Michal Hocko wrote:
> On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
>> Hi,
>> We have the following function policy_zonelist() which selects a zonelist
>> during various allocation paths. With this, general user space allocations
>> (IIUC might not have __GFP_THISNODE) fails while trying to get memory from
>> a memory only node without CPUs as the application runs some where else
>> and that node is not part of the nodemask.

My bad. Was playing with some changes to the zonelists rebuild after
a memory node hotplug and the order of various zones in them.

> I am not sure I understand. So you have a task with MPOL_BIND without a
> cpu less node in the mask and you are wondering why the memory is not
> allocated from that node?

In my experiment, there is a MPOL_BIND call with a CPU less node in
the node mask and the memory is not allocated from that CPU less node.
Thats because the zone of the CPU less node was absent from the
FALLBACK zonelist of the local node.

>> Why we insist on __GFP_THISNODE ?
> AFAIU __GFP_THISNODE just overrides the given node to the policy
> nodemask in case the current node is not part of that node mask. In
> other words we are ignoring the given node and use what the policy says.

Right but provided the gfp flag has __GFP_THISNODE in it. In absence
of __GFP_THISNODE, the node from the nodemask will not be selected. I
still wonder why ? Can we always go to the first node in the nodemask
for MPOL_BIND interface calls ? Just curious to know why preference
is given to the local node and it's FALLBACK zonelist.

> I can see how this can be confusing especially when confronting the
> documentation:
> * __GFP_THISNODE forces the allocation to be satisified from the requested
> * node with no fallbacks or placement policy enforcements.

Yeah, right.

Thanks for your reply.