Re: [PATCH v2 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry

From: David Hildenbrand
Date: Tue Jan 25 2022 - 11:43:53 EST


On 12.01.22 14:15, Dong Aisheng wrote:
> On an ARMv7 platform with 32M pageblock(MAX_ORDER 14), we observed a

Did you actually intend to talk about pageblocks here (and below)?

I assume you have to be clearer here that you talk about the maximum
allocation granularity, which is usually bigger than actual pageblock size.

> huge number of repeat retries of CMA allocation (1k+) during booting
> when allocating one page for each of 3 mmc instance probe.
>
> This is caused by CMA now supports cocurrent allocation since commit
> a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
> The pageblock or (MAX_ORDER -1) from which we are trying to allocate
> memory may have already been acquired and isolated by others.
> Current cma_alloc() will then retry the next area by the step of
> bitmap_no + mask + 1 which are very likely within the same isolated range
> and fail again. So when the pageblock or MAX_ORDER is big (e.g. 8192),
> keep retrying in a small step become meaningless because it will be known
> to fail at a huge number of times due to the pageblock has been isolated
> by others, especially when allocating only one or two pages.
>
> Instread of looping in the same pageblock and wasting CPU mips a lot,
> especially for big pageblock system (e.g. 16M or 32M),
> we try the next MAX_ORDER_NR_PAGES directly.
>
> Doing this way can greatly mitigate the situtation.
>
> Below is the original error log during booting:
> [ 2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [ 2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [ 2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [ 2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [ 2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [ 2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [ 2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [ 2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> .... (+1K retries)
>
> After fix, the 1200+ reties can be reduced to 0.
> Another test running 8 VPU decoder in parallel shows that 1500+ retries
> dropped to ~145.
>
> IOW this patch can improve the CMA allocation speed a lot when there're
> enough CMA memory by reducing retries significantly.
>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Cc: Lecopzer Chen <lecopzer.chen@xxxxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxx>
> CC: stable@xxxxxxxxxxxxxxx # 5.11+
> Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
> Signed-off-by: Dong Aisheng <aisheng.dong@xxxxxxx>
> ---
> v1->v2:
> * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages
> ---
> mm/cma.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 1c13a729d274..1251f65e2364 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -500,7 +500,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
> count, align);
> /* try again with a bit different memory target */
> - start = bitmap_no + mask + 1;
> + start = ALIGN(bitmap_no + mask + 1,
> + MAX_ORDER_NR_PAGES >> cma->order_per_bit);

Mind giving the reader a hint in the code why we went for
MAX_ORDER_NR_PAGES?

What would happen if the CMA granularity is bigger than
MAX_ORDER_NR_PAGES? I'd assume no harm done, as we'd try aligning to 0.

--
Thanks,

David / dhildenb