Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask

From: Vlastimil Babka
Date: Mon Oct 22 2018 - 09:15:47 EST


On 9/26/18 4:22 PM, Michal Hocko wrote:
> On Wed 26-09-18 16:17:08, Michal Hocko wrote:
>> On Wed 26-09-18 16:30:39, Kirill A. Shutemov wrote:
>>> On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote:
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index c3bc7e9c9a2a..c0bcede31930 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>>>> * available
>>>> * never: never stall for any thp allocation
>>>> */
>>>> -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>>>> +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
>>>> {
>>>> const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>>>> + gfp_t this_node = 0;
>>>> +
>>>> +#ifdef CONFIG_NUMA
>>>> + struct mempolicy *pol;
>>>> + /*
>>>> + * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
>>>> + * specified, to express a general desire to stay on the current
>>>> + * node for optimistic allocation attempts. If the defrag mode
>>>> + * and/or madvise hint requires the direct reclaim then we prefer
>>>> + * to fallback to other node rather than node reclaim because that
>>>> + * can lead to excessive reclaim even though there is free memory
>>>> + * on other nodes. We expect that NUMA preferences are specified
>>>> + * by memory policies.
>>>> + */
>>>> + pol = get_vma_policy(vma, addr);
>>>> + if (pol->mode != MPOL_BIND)
>>>> + this_node = __GFP_THISNODE;
>>>> + mpol_cond_put(pol);
>>>> +#endif
>>>
>>> I'm not very good with NUMA policies. Could you explain in more details how
>>> the code above is equivalent to the code below?
>>
>> MPOL_PREFERRED is handled by policy_node() before we call __alloc_pages_nodemask.
>> __GFP_THISNODE is applied only when we are not using
>> __GFP_DIRECT_RECLAIM which is handled in alloc_hugepage_direct_gfpmask
>> now.
>> Lastly MPOL_BIND wasn't handled explicitly but in the end the removed
>> late check would remove __GFP_THISNODE for it as well. So in the end we
>> are doing the same thing unless I miss something
>
> Forgot to add. One notable exception would be that the previous code
> would allow to hit
> WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
> in policy_node if the requested node (e.g. cpu local one) was outside of
> the mbind nodemask. This is not possible now. We haven't heard about any
> such warning yet so it is unlikely that it happens though.

I don't think the previous code could hit the warning, as the hugepage
path that would add __GFP_THISNODE didn't call policy_node() (containing
the warning) at all. IIRC early of your patch did hit the warning
though, which is why you added the MPOL_BIND policy check.