Re: [PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page_node into fast and slow path

From: Michal Hocko
Date: Tue Jan 24 2017 - 11:52:15 EST

Next message: Shameerali Kolothum Thodi: "RE: [RFC 2/4] irqchip, gicv3-its:Workaround for HiSilicon erratum 161010801"
Previous message: Greg Kurz: "[PATCH] vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null"
In reply to: Jia He: "[PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page_node into fast and slow path"
Next in thread: Jia He: "[PATCH RFC 2/3] mm, vmscan: limit kswapd loop if no progress is made"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue 24-01-17 15:49:02, Jia He wrote:
> This patch split alloc_fresh_huge_page_node into 2 parts:
> - fast path without __GFP_REPEAT flag
> - slow path with __GFP_REPEAT flag
>
> Thus, if there is a server with uneven numa memory layout:
> available: 7 nodes (0-6)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 6603 MB
> node 0 free: 91 MB
> node 1 cpus:
> node 1 size: 12527 MB
> node 1 free: 157 MB
> node 2 cpus:
> node 2 size: 15087 MB
> node 2 free: 189 MB
> node 3 cpus:
> node 3 size: 16111 MB
> node 3 free: 205 MB
> node 4 cpus: 8 9 10 11 12 13 14 15
> node 4 size: 24815 MB
> node 4 free: 310 MB
> node 5 cpus:
> node 5 size: 4095 MB
> node 5 free: 61 MB
> node 6 cpus:
> node 6 size: 22750 MB
> node 6 free: 283 MB
> node distances:
> node 0 1 2 3 4 5 6
> 0: 10 20 40 40 40 40 40
> 1: 20 10 40 40 40 40 40
> 2: 40 40 10 20 40 40 40
> 3: 40 40 20 10 40 40 40
> 4: 40 40 40 40 10 20 40
> 5: 40 40 40 40 20 10 40
> 6: 40 40 40 40 40 40 10
>
> In this case node 5 has less memory and we will alloc the hugepages
> from these nodes one by one.
> After this patch, we will not trigger too early direct memory/kswap
> reclaim for node 5 if there are enough memory in other nodes.

This description is doesn't explain what is the problem, why it matters
and how the fix actually works. Moreover it does opposite what is
claims. Which brings me to another question. How has this been tested?

> Signed-off-by: Jia He <hejianet@xxxxxxxxx>
> ---
> mm/hugetlb.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c7025c1..f2415ce 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1364,10 +1364,19 @@ static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid)
> {
> struct page *page;
>
> + /* fast path without __GFP_REPEAT */
> page = __alloc_pages_node(nid,
> htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
> __GFP_REPEAT|__GFP_NOWARN,
> huge_page_order(h));

this does opposite what the comment says.

> +
> + /* slow path with __GFP_REPEAT*/
> + if (!page)
> + page = __alloc_pages_node(nid,
> + htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE|
> + __GFP_NOWARN,
> + huge_page_order(h));
> +
> if (page) {
> prep_new_huge_page(h, page, nid);
> }
> --
> 2.5.5
>

--
Michal Hocko
SUSE Labs

Next message: Shameerali Kolothum Thodi: "RE: [RFC 2/4] irqchip, gicv3-its:Workaround for HiSilicon erratum 161010801"
Previous message: Greg Kurz: "[PATCH] vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null"
In reply to: Jia He: "[PATCH RFC 1/3] mm/hugetlb: split alloc_fresh_huge_page_node into fast and slow path"
Next in thread: Jia He: "[PATCH RFC 2/3] mm, vmscan: limit kswapd loop if no progress is made"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]