Re: [PATCH 1/3] mm, meminit: Recalculate pcpu batch and high limits after init completes
From: Vlastimil Babka
Date: Mon Oct 21 2019 - 07:42:31 EST
On 10/21/19 11:48 AM, Mel Gorman wrote:
> Deferred memory initialisation updates zone->managed_pages during
> the initialisation phase but before that finishes, the per-cpu page
> allocator (pcpu) calculates the number of pages allocated/freed in
> batches as well as the maximum number of pages allowed on a per-cpu list.
> As zone->managed_pages is not up to date yet, the pcpu initialisation
> calculates inappropriately low batch and high values.
>
> This increases zone lock contention quite severely in some cases with the
> degree of severity depending on how many CPUs share a local zone and the
> size of the zone. A private report indicated that kernel build times were
> excessive with extremely high system CPU usage. A perf profile indicated
> that a large chunk of time was lost on zone->lock contention.
>
> This patch recalculates the pcpu batch and high values after deferred
> initialisation completes for every populated zone in the system. It
> was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
> workload -- allmodconfig and all available CPUs.
>
> mmtests configuration: config-workload-kernbench-max
> Configuration was modified to build on a fresh XFS partition.
>
> kernbench
> 5.4.0-rc3 5.4.0-rc3
> vanilla resetpcpu-v2
> Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
> Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
> Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
> Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
> Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
> Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
>
> 5.4.0-rc3 5.4.0-rc3
> vanilla resetpcpu-v2
> Duration User 39766.24 49221.79
> Duration System 44298.10 13361.67
> Duration Elapsed 519.11 388.87
>
> The patch reduces system CPU usage by 69.86% and total build time by
> 26.65%. The variance of system CPU usage is also much reduced.
>
> Before, this was the breakdown of batch and high values over all zones was.
>
> 256 batch: 1
> 256 batch: 63
> 512 batch: 7
> 256 high: 0
> 256 high: 378
> 512 high: 42
>
> 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch
>
> 256 batch: 1
> 768 batch: 63
> 256 high: 0
> 768 high: 378
>
> Cc: stable@xxxxxxxxxxxxxxx # v4.1+
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---
> mm/page_alloc.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c0b2e0306720..f972076d0f6b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1947,6 +1947,14 @@ void __init page_alloc_init_late(void)
> /* Block until all are initialised */
> wait_for_completion(&pgdat_init_all_done_comp);
>
> + /*
> + * The number of managed pages has changed due to the initialisation
> + * so the pcpu batch and high limits needs to be updated or the limits
> + * will be artificially small.
> + */
> + for_each_populated_zone(zone)
> + zone_pcp_update(zone);
> +
> /*
> * We initialized the rest of the deferred pages. Permanently disable
> * on-demand struct page initialization.
>