Re: [PATCH] mm: include CMA pages in lowmem_reserve at boot
From: Michal Hocko
Date: Fri Aug 14 2020 - 02:59:12 EST
On Thu 13-08-20 10:55:17, Doug Berger wrote:
[...]
> One example might be a 1GB arm platform that defines a 256MB default CMA
> region. The default zones might map as follows:
> [ 0.000000] cma: Reserved 256 MiB at 0x0000000030000000
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
> [ 0.000000] Normal empty
> [ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff]
[...]
>
> Here you can see that the lowmem_reserve array for the DMA zone is all
> 0's. This is because the HighMem zone is consumed by the CMA region
> whose pages haven't been activated to increase the zone managed count
> when init_per_zone_wmark_min() is invoked at boot.
>
> If we access the /proc/sys/vm/lowmem_reserve_ratio sysctl with:
> # cat /proc/sys/vm/lowmem_reserve_ratio
> 256 32 0 0
Yes, this is really an unexpected behavior.
[...]
> Here the lowmem_reserve back pressure for the DMA zone for allocations
> that target the HighMem zone is now 256 pages. Now 1MB is still not a
> lot of additional back pressure, but the watermarks on the HighMem zone
> aren't very large either so User space allocations can easily start
> consuming the DMA zone while kswapd starts trying to reclaim space in
> HighMem. This excess pressure on DMA zone memory can potentially lead to
> earlier triggers of OOM Killer and/or kernel fallback allocations into
> CMA Movable pages which can interfere with the ability of CMA to obtain
> larger size contiguous allocations.
>
> All of that said, my main concern is that I don't like the inconsistency
> between the boot time and run time results.
Thanks for the clarification. I would suggest extending your changlog by
the following.
"
In many cases the difference is not significant, but for example an ARM
platform with 1GB of memory and the following memory layout
[ 0.000000] cma: Reserved 256 MiB at 0x0000000030000000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
[ 0.000000] Normal empty
[ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff]
would result in 0 lowmem_reserve for the DMA zone. This would allow
userspace the deplete the DMA zone easily. Funnily enough
$ cat /proc/sys/vm/lowmem_reserve_ratio
would fix up the situation because it forces setup_per_zone_lowmem_reserve
as a side effect.
"
With that feel free to add
Acked-by: Michal Hocko <mhocko@xxxxxxxx.
Thanks!
--
Michal Hocko
SUSE Labs