Re: [PATCH v2] mm: include CMA pages in lowmem_reserve at boot

From: Andrew Morton
Date: Tue Aug 18 2020 - 23:18:20 EST


On Fri, 14 Aug 2020 09:49:26 -0700 Doug Berger <opendmb@xxxxxxxxx> wrote:

> The lowmem_reserve arrays provide a means of applying pressure
> against allocations from lower zones that were targeted at
> higher zones. Its values are a function of the number of pages
> managed by higher zones and are assigned by a call to the
> setup_per_zone_lowmem_reserve() function.
>
> The function is initially called at boot time by the function
> init_per_zone_wmark_min() and may be called later by accesses
> of the /proc/sys/vm/lowmem_reserve_ratio sysctl file.
>
> The function init_per_zone_wmark_min() was moved up from a
> module_init to a core_initcall to resolve a sequencing issue
> with khugepaged. Unfortunately this created a sequencing issue
> with CMA page accounting.
>
> The CMA pages are added to the managed page count of a zone
> when cma_init_reserved_areas() is called at boot also as a
> core_initcall. This makes it uncertain whether the CMA pages
> will be added to the managed page counts of their zones before
> or after the call to init_per_zone_wmark_min() as it becomes
> dependent on link order. With the current link order the pages
> are added to the managed count after the lowmem_reserve arrays
> are initialized at boot.
>
> This means the lowmem_reserve values at boot may be lower than
> the values used later if /proc/sys/vm/lowmem_reserve_ratio is
> accessed even if the ratio values are unchanged.
>
> In many cases the difference is not significant, but for example
> an ARM platform with 1GB of memory and the following memory layout
> [ 0.000000] cma: Reserved 256 MiB at 0x0000000030000000
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff]
> [ 0.000000] Normal empty
> [ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff]
>
> would result in 0 lowmem_reserve for the DMA zone. This would allow
> userspace to deplete the DMA zone easily.

Sounds fairly serious for thos machines. Was a cc:stable considered?

> Funnily enough
> $ cat /proc/sys/vm/lowmem_reserve_ratio
> would fix up the situation because it forces
> setup_per_zone_lowmem_reserve as a side effect.
>
> This commit breaks the link order dependency by invoking
> init_per_zone_wmark_min() as a postcore_initcall so that the
> CMA pages have the chance to be properly accounted in their
> zone(s) and allowing the lowmem_reserve arrays to receive
> consistent values.
>