Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9

From: David Hildenbrand
Date: Thu Apr 22 2021 - 14:35:38 EST


On 22.04.21 19:50, Florian Fainelli wrote:


On 4/22/2021 1:56 AM, David Hildenbrand wrote:
On 22.04.21 09:49, Michal Hocko wrote:
Cc David and Oscar who are familiar with this code as well.

On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
Hi all,

I have been trying for the past few days to identify the source of a
performance regression that we are seeing with the 5.4 kernel but not
with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
challenging at the moment but will happen eventually.

What we are seeing is a ~3x increase in the time needed for
alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
is idle at the time and there are no other contenders for memory other
than the user-space programs already started (DHCP client, shell, etc.).

Hi,

If you can easily reproduce it might be worth to just try bisecting;
that could be faster than manually poking around in the code.

Also, it would be worth having a look at the state of upstream Linux.
Upstream Linux developers tend to not care about minor performance
regressions on oldish kernels.

This is a big pain point here and I cannot agree more, but until we
bridge that gap, this is not exactly easy to do for me unfortunately and
neither is bisection :/


There has been work on improving exactly the situation you are
describing -- a "fail fast" / "no retry" mode for alloc_contig_range().
Maybe it tackles exactly this issue.

https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx

Minchan is already on cc.

This patch does not appear to be helping, in fact, I had locally applied
this patch from way back when:

https://lkml.org/lkml/2014/5/28/113

which would effectively do this unconditionally. Let me see if I can
showcase this problem a x86 virtual machine operating in similar
conditions to ours.

How exactly are you allocating these 2MiB blocks?

Via CMA->alloc_contig_range() or via alloc_contig_range() directly? I assume via CMA.

For

https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx

to do its work you'll have to pass __GFP_NORETRY to alloc_contig_range(). This requires CMA adaptions, from where we call alloc_contig_range().

--
Thanks,

David / dhildenb