Re: MPOL_BIND on memory only nodes

From: Michal Hocko
Date: Thu Oct 13 2016 - 08:39:26 EST


On Thu 13-10-16 11:24:59, Mel Gorman wrote:
> On Wed, Oct 12, 2016 at 03:16:27PM +0200, Michal Hocko wrote:
> > On Wed 12-10-16 11:43:37, Michal Hocko wrote:
> > > On Wed 12-10-16 14:55:24, Anshuman Khandual wrote:
> > [...]
> > > > Why we insist on __GFP_THISNODE ?
> > >
> > > AFAIU __GFP_THISNODE just overrides the given node to the policy
> > > nodemask in case the current node is not part of that node mask. In
> > > other words we are ignoring the given node and use what the policy says.
> > > I can see how this can be confusing especially when confronting the
> > > documentation:
> > >
> > > * __GFP_THISNODE forces the allocation to be satisified from the requested
> > > * node with no fallbacks or placement policy enforcements.
> >
> > You made me think and look into this deeper. I came to the conclusion
> > that this is actually a relict from the past. policy_zonelist is called
> > only from 3 places:
> > - huge_zonelist - never should do __GFP_THISNODE when going this path
> > - alloc_pages_vma - which shouldn't depend on __GFP_THISNODE either
> > - alloc_pages_current - which uses default_policy id __GFP_THISNODE is
> > used
> >
> > So AFAICS this is essentially a dead code or I am missing something. Mel
> > do you remember why we needed it in the past?
>
> I don't recall a specific reason. It was likely due to confusion on my
> part at the time on the exact use of __GFP_THISNODE. The expectation is
> that flag is not used in fault paths or with policies. It's meant to
> enforce node-locality for kernel internal decisions such as the locality
> of slab pages and ensuring that a THP collapse from khugepaged is on the
> same node.

This is my understanding as well. Thanks for double checking. I will
send a proper patch (it will even compile as a bonus point ;).
--
Michal Hocko
SUSE Labs