Re: alloc_contig_range() with MIGRATE_MOVABLE performance regression since 4.9

From: David Hildenbrand
Date: Thu Apr 22 2021 - 04:56:33 EST


On 22.04.21 09:49, Michal Hocko wrote:
Cc David and Oscar who are familiar with this code as well.

On Wed 21-04-21 11:36:01, Florian Fainelli wrote:
Hi all,

I have been trying for the past few days to identify the source of a
performance regression that we are seeing with the 5.4 kernel but not
with the 4.9 kernel on ARM64. Testing something newer like 5.10 is a bit
challenging at the moment but will happen eventually.

What we are seeing is a ~3x increase in the time needed for
alloc_contig_range() to allocate 1GB in blocks of 2MB pages. The system
is idle at the time and there are no other contenders for memory other
than the user-space programs already started (DHCP client, shell, etc.).

Hi,

If you can easily reproduce it might be worth to just try bisecting; that could be faster than manually poking around in the code.

Also, it would be worth having a look at the state of upstream Linux. Upstream Linux developers tend to not care about minor performance regressions on oldish kernels.

There has been work on improving exactly the situation you are describing -- a "fail fast" / "no retry" mode for alloc_contig_range(). Maybe it tackles exactly this issue.

https://lkml.kernel.org/r/20210121175502.274391-3-minchan@xxxxxxxxxx

Minchan is already on cc.

(next time, please cc linux-mm on core-mm questions; maybe you tried, but ended up with linux-mmc :) )


I have tried playing with the compact_control structure settings but
have not found anything that would bring us back to the performance of
4.9. More often than not, we see test_pages_isolated() returning an
non-zero error code which would explain the slow down, since we have
some logic that re-tries the allocation if alloc_contig_range() returns
-EBUSY. If I remove the retry logic however, we don't get -EBUSY and we
get the results below:

4.9 shows this:

[ 457.537634] allocating: size: 1024MB avg: 59172 (us), max: 137306
(us), min: 44859 (us), total: 591723 (us), pages: 512, per-page: 115 (us)
[ 457.550222] freeing: size: 1024MB avg: 67397 (us), max: 151408 (us),
min: 52630 (us), total: 673974 (us), pages: 512, per-page: 131 (us)

5.4 show this:

[ 222.388758] allocating: size: 1024MB avg: 156739 (us), max: 157254
(us), min: 155915 (us), total: 1567394 (us), pages: 512, per-page: 306 (us)
[ 222.401601] freeing: size: 1024MB avg: 209899 (us), max: 210085 (us),
min: 209749 (us), total: 2098999 (us), pages: 512, per-page: 409 (us)

This regression is not seen when MIGRATE_CMA is specified instead of
MIGRATE_MOVABLE.

A few characteristics that you should probably be aware of:

- There is 4GB of memory populated with the memory being mapped into the
CPU's address starting at space at 0x4000_0000 (1GB), PAGE_SIZE is 4KB

- there is a ZONE_DMA32 that starts at 0x4000_0000 and ends at
0xE480_0000, from there on we have a ZONE_MOVABLE which is comprised of
0xE480_0000 - 0xfdc00000 and another range spanning 0x1_0000_0000 -
0x1_4000_0000

Attached is the kernel configuration.

Thanks!
--
Florian





--
Thanks,

David / dhildenb