Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
From: Michal Hocko
Date: Tue Jul 07 2020 - 07:31:23 EST
On Tue 07-07-20 16:44:42, Joonsoo Kim wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
>
> new_non_cma_page() in gup.c which try to allocate migration target page
> requires to allocate the new page that is not on the CMA area.
> new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
> works well for THP page or normal page but not for hugetlb page.
>
> hugetlb page allocation process consists of two steps. First is dequeing
> from the pool. Second is, if there is no available page on the queue,
> allocating from the page allocator.
>
> new_non_cma_page() can control allocation from the page allocator by
> specifying correct gfp flag. However, dequeing cannot be controlled until
> now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
> since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
> patch tries to fix this situation.
>
> This patch makes the deque function on hugetlb CMA aware and skip CMA
> pages if newly added skip_cma argument is passed as true.
I really dislike this as already mentioned in the previous version of
the patch. You are making hugetlb and only one part of its allocator a
special snowflake which is almost always a bad idea. Your changelog
lacks any real justification for this inconsistency.
Also by doing so you are keeping an existing bug that the hugetlb
allocator doesn't respect scope gfp flags as I have mentioned when
reviewing the previous version. That bug likely doesn't matter now but
it might in future and as soon as it is fixed all this is just a
pointless exercise.
I do not have energy and time to burn to repeat that argumentation to I
will let Mike to have a final word. Btw. you are keeping his acks even
after considerable changes to patches which I am not really sure he is
ok with.
> Acked-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
To this particular patch.
[...]
> diff --git a/mm/gup.c b/mm/gup.c
> index 5daadae..2c3dab4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private)
> #ifdef CONFIG_HUGETLB_PAGE
> if (PageHuge(page)) {
> struct hstate *h = page_hstate(page);
> +
> /*
> * We don't want to dequeue from the pool because pool pages will
> * mostly be from the CMA region.
> */
> - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
> + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
Let me repeat that this whole thing is running under
memalloc_nocma_save. So additional parameter is bogus.
[...]
> -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma)
If you really insist on an additional parameter at this layer than it
should be checking for the PF_MEMALLOC_NOCMA instead.
[...]
> @@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
>
> /* page migration callback function */
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> - nodemask_t *nmask, gfp_t gfp_mask)
> + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
> {
> + unsigned int flags = 0;
> + struct page *page = NULL;
> +
> + if (skip_cma)
> + flags = memalloc_nocma_save();
This is pointless for a scope that is already defined up in the call
chain and fundamentally this is breaking the expected use of the scope
API. The primary reason for that API to exist is to define the scope and
have it sticky for _all_ allocation contexts. So if you have to use it
deep in the allocator then you are doing something wrong.
--
Michal Hocko
SUSE Labs