Re: [PATCH] mm, proc: collect percpu free pages into the free pages
From: mawupeng
Date: Sun Sep 01 2024 - 21:12:18 EST
On 2024/8/30 15:53, Huang, Ying wrote:
> Hi, Wupeng,
>
> Wupeng Ma <mawupeng1@xxxxxxxxxx> writes:
>
>> From: Ma Wupeng <mawupeng1@xxxxxxxxxx>
>>
>> The introduction of Per-CPU-Pageset (PCP) per zone aims to enhance the
>> performance of the page allocator by enabling page allocation without
>> requiring the zone lock. This kind of memory is free memory however is
>> not included in Memfree or MemAvailable.
>>
>> With the support of higt-order pcp and pcp auto-tuning, the size of the
>> pages in this list has become a matter of concern due to the following
>> patches:
>>
>> 1. Introduction of Order 1~3 and PMD level PCP in commit 44042b449872
>> ("mm/page_alloc: allow high-order pages to be stored on the per-cpu
>> lists").
>> 2. Introduction of PCP auto-tuning in commit 90b41691b988 ("mm: add
>> framework for PCP high auto-tuning").
>
> With PCP auto-tuning, the idle pages in PCP will be freed to buddy after
> some time (may be as long as tens seconds in some cases).
Thank you for the detailed explanation regarding PCP auto-tuning. If the
PCP pages are freed to the buddy after a certain period due to auto-tuning,
it's possible that there is no direct association between PCP auto-tuning
and the increase in the PCP count as indicated below, especially if no
actual tasks have commenced after booting. The primary reason for the
increase might be more orders and a surplus of CPUs.
>
>> Which lead to the total amount of the pcp can not be ignored just after
>> booting without any real tasks for as the result show below:
>>
>> w/o patch with patch diff diff/total
>> MemTotal: 525424652 kB 525424652 kB 0 kB 0%
>> MemFree: 517030396 kB 520134136 kB 3103740 kB 0.6%
>> MemAvailable: 515837152 kB 518941080 kB 3103928 kB 0.6%
We do the following experiments which make the pcp amount even bigger:
1. alloc 8G of memory in all of the 600+ cpus
2. kill all the above user tasks
3. waiting for 36h
the pcp amount 6161097(24644M) which 4.6% of the total 512G memory.
>>
>> On a machine with 16 zones and 600+ CPUs, prior to these commits, the PCP
>> list contained 274368 pages (1097M) immediately after booting. In the
>> mainline, this number has increased to 3003M, marking a 173% increase.
>>
>> Since available memory is used by numerous services to determine memory
>> pressure. A substantial PCP memory volume leads to an inaccurate estimation
>> of available memory size, significantly impacting the service logic.
>>
>> Remove the useless CONFIG_HIGMEM in si_meminfo_node since it will always
>> false in is_highmem_idx if config is not enabled.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@xxxxxxxxxx>
>> Signed-off-by: Liu Shixin <liushixin2@xxxxxxxxxx>
>
> This has been discussed before in the thread of the previous version,
> better to refer to it and summarize it.
>
> [1] https://lore.kernel.org/linux-mm/YwSGqtEICW5AlhWr@xxxxxxxxxxxxxx/
As Michal Hocko mentioned in previous discussion:
1. If it is a real problem?
2. MemAvailable is documented as available without swapping, however
pcp need to drain reclaim.
1. Since available memory is used by numerous services to determine memory
pressure. A substantial PCP memory volume leads to an inaccurate estimation
of available memory size, significantly impacting the service logic.
2. MemAvailable here do seems wired. There is no reason to drain pcp to
drop clean page cache As Michal Hocko already pointed in this post, drain
clean page cache is much cheaper than drain remote pcp.Any idea on this?
[1] https://lore.kernel.org/linux-mm/ZWRYZmulV0B-Jv3k@tiehlicka/
>
> --
> Best Regards,
> Huang, Ying
>