Re: [PATCH] memblock: config the number of init memblock regions

From: Zhouguanghui (OS Kernel)
Date: Wed May 11 2022 - 22:46:32 EST


在 2022/5/11 14:03, Mike Rapoport 写道:
> On Tue, May 10, 2022 at 06:55:23PM -0700, Andrew Morton wrote:
>> On Wed, 11 May 2022 01:05:30 +0000 Zhou Guanghui <zhouguanghui1@xxxxxxxxxx> wrote:
>>
>>> During early boot, the number of memblocks may exceed 128(some memory
>>> areas are not reported to the kernel due to test failures. As a result,
>>> contiguous memory is divided into multiple parts for reporting). If
>>> the size of the init memblock regions is exceeded before the array size
>>> can be resized, the excess memory will be lost.
>
> I'd like to see more details about how firmware creates that sparse memory
> map in the changelog.
>

The scenario is as follows: In a system using HBM, a multi-bit ECC error
occurs, and the BIOS saves the corresponding area (for example, 2 MB).
When the system restarts next time, these areas are isolated and not
reported or reported as EFI_UNUSABLE_MEMORY. Both of them lead to an
increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads
to a larger number of memblocks.

For example, if the EFI_UNUSABLE_MEMORY type is reported:
..
memory[0x92] [0x0000200834a00000-0x0000200835bfffff],
0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x93] [0x0000200835c00000-0x0000200835dfffff],
0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x94] [0x0000200835e00000-0x00002008367fffff],
0x0000000000a00000 bytes on node 7 flags: 0x0
memory[0x95] [0x0000200836800000-0x00002008369fffff],
0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x96] [0x0000200836a00000-0x0000200837bfffff],
0x0000000001200000 bytes on node 7 flags: 0x0
memory[0x97] [0x0000200837c00000-0x0000200837dfffff],
0x0000000000200000 bytes on node 7 flags: 0x4
memory[0x98] [0x0000200837e00000-0x000020087fffffff],
0x0000000048200000 bytes on node 7 flags: 0x0
memory[0x99] [0x0000200880000000-0x0000200bcfffffff],
0x0000000350000000 bytes on node 6 flags: 0x0
memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff],
0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff],
0x0000000000600000 bytes on node 6 flags: 0x0
memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff],
0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff],
0x00000003ff600000 bytes on node 6 flags: 0x0
memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff],
0x0000000000200000 bytes on node 6 flags: 0x4
memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff],
0x000000002fe00000 bytes on node 6 flags: 0x0
..

>>>
>>> ...
>>>
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -89,6 +89,14 @@ config SPARSEMEM_VMEMMAP
>>> pfn_to_page and page_to_pfn operations. This is the most
>>> efficient option when sufficient kernel resources are available.
>>>
>>> +config MEMBLOCK_INIT_REGIONS
>>> + int "Number of init memblock regions"
>>> + range 128 1024
>>> + default 128
>>> + help
>>> + The number of init memblock regions which used to track "memory" and
>>> + "reserved" memblocks during early boot.
>>> +
>>> config HAVE_MEMBLOCK_PHYS_MAP
>>> bool
>>>
>>> diff --git a/mm/memblock.c b/mm/memblock.c
>>> index e4f03a6e8e56..6893d26b750e 100644
>>> --- a/mm/memblock.c
>>> +++ b/mm/memblock.c
>>> @@ -22,7 +22,7 @@
>>>
>>> #include "internal.h"
>>>
>>> -#define INIT_MEMBLOCK_REGIONS 128
>>> +#define INIT_MEMBLOCK_REGIONS CONFIG_MEMBLOCK_INIT_REGIONS
>>
>> Consistent naming would be nice - MEMBLOCK_INIT versus INIT_MEMBLOCK.

I agree.

>>
>> Can we simply increase INIT_MEMBLOCK_REGIONS to 1024 and avoid the
>> config option? It appears that the overhead from this would be 60kB or
>> so.
>
> 60k is not big, but using 1024 entries array for 2-4 memory banks on
> systems that don't report that fragmented memory map is really a waste.
>
> We can make this per platform opt-in, like INIT_MEMBLOCK_RESERVED_REGIONS ...
>

As I described above, is this a general scenario?

>> Or zero if CONFIG_ARCH_KEEP_MEMBLOCK and CONFIG_MEMORY_HOTPLUG
>> are cooperating.
>
> ... or add code that will discard unused parts of memblock arrays even if
> CONFIG_ARCH_KEEP_MEMBLOCK=y.
>

In scenarios where the memory usage is sensitive, should
CONFIG_ARCH_KEEP_MEMBLOCK be set to n or set the number by adding config?

Andrew, Mike, thank you.