Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

From: Michal Hocko
Date: Wed Sep 26 2018 - 10:22:33 EST


On Wed 26-09-18 16:17:08, Michal Hocko wrote:
> On Wed 26-09-18 16:30:39, Kirill A. Shutemov wrote:
> > On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote:
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index c3bc7e9c9a2a..c0bcede31930 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
> > > * available
> > > * never: never stall for any thp allocation
> > > */
> > > -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
> > > +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
> > > {
> > > const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
> > > + gfp_t this_node = 0;
> > > +
> > > +#ifdef CONFIG_NUMA
> > > + struct mempolicy *pol;
> > > + /*
> > > + * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
> > > + * specified, to express a general desire to stay on the current
> > > + * node for optimistic allocation attempts. If the defrag mode
> > > + * and/or madvise hint requires the direct reclaim then we prefer
> > > + * to fallback to other node rather than node reclaim because that
> > > + * can lead to excessive reclaim even though there is free memory
> > > + * on other nodes. We expect that NUMA preferences are specified
> > > + * by memory policies.
> > > + */
> > > + pol = get_vma_policy(vma, addr);
> > > + if (pol->mode != MPOL_BIND)
> > > + this_node = __GFP_THISNODE;
> > > + mpol_cond_put(pol);
> > > +#endif
> >
> > I'm not very good with NUMA policies. Could you explain in more details how
> > the code above is equivalent to the code below?
>
> MPOL_PREFERRED is handled by policy_node() before we call __alloc_pages_nodemask.
> __GFP_THISNODE is applied only when we are not using
> __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask
> now.
> Lastly MPOL_BIND wasn't handled explicitly but in the end the removed
> late check would remove __GFP_THISNODE for it as well. So in the end we
> are doing the same thing unless I miss something

Forgot to add. One notable exception would be that the previous code
would allow to hit
WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
in policy_node if the requested node (e.g. cpu local one) was outside of
the mbind nodemask. This is not possible now. We haven't heard about any
such warning yet so it is unlikely that it happens though.
--
Michal Hocko
SUSE Labs