Re: [PATCH 1/2] arm64: mem-model: add flatmem model for arm64

From: Ard Biesheuvel
Date: Mon Apr 11 2016 - 07:02:12 EST


On 11 April 2016 at 12:48, Chen Feng <puck.chen@xxxxxxxxxxxxx> wrote:
>
>
> On 2016/4/11 17:59, Chen Feng wrote:
>> Hi Ard,
>>
>> On 2016/4/11 16:00, Ard Biesheuvel wrote:
>>> On 11 April 2016 at 09:55, Chen Feng <puck.chen@xxxxxxxxxxxxx> wrote:
>>>> Hi Ard,
>>>>
>>>> On 2016/4/11 15:35, Ard Biesheuvel wrote:
>>>>> On 11 April 2016 at 04:49, Chen Feng <puck.chen@xxxxxxxxxxxxx> wrote:
>>>>>> Hi will,
>>>>>> Thanks for review.
>>>>>>
>>>>>> On 2016/4/7 22:21, Will Deacon wrote:
>>>>>>> On Tue, Apr 05, 2016 at 04:22:51PM +0800, Chen Feng wrote:
>>>>>>>> We can reduce the memory allocated at mem-map
>>>>>>>> by flatmem.
>>>>>>>>
>>>>>>>> currently, the default memory-model in arm64 is
>>>>>>>> sparse memory. The mem-map array is not freed in
>>>>>>>> this scene. If the physical address is too long,
>>>>>>>> it will reserved too much memory for the mem-map
>>>>>>>> array.
>>>>>>>
>>>>>>> Can you elaborate a bit more on this, please? We use the vmemmap, so any
>>>>>>> spaces between memory banks only burns up virtual space. What exactly is
>>>>>>> the problem you're seeing that makes you want to use flatmem (which is
>>>>>>> probably unsuitable for the majority of arm64 machines).
>>>>>>>
>>>>>> The root cause we want to use flat-mem is the mam_map alloced in sparse-mem
>>>>>> is not freed.
>>>>>>
>>>>>> take a look at here:
>>>>>> arm64/mm/init.c
>>>>>> void __init mem_init(void)
>>>>>> {
>>>>>> #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>>>>> free_unused_memmap();
>>>>>> #endif
>>>>>> }
>>>>>>
>>>>>> Memory layout (3GB)
>>>>>>
>>>>>> 0 1.5G 2G 3.5G 4G
>>>>>> | | | | |
>>>>>> +--------------+------+---------------+--------------+
>>>>>> | MEM | hole | MEM | IO (regs) |
>>>>>> +--------------+------+---------------+--------------+
>>>>>>
>>>>>>
>>>>>> Memory layout (4GB)
>>>>>>
>>>>>> 0 3.5G 4G 4.5G
>>>>>> | | | |
>>>>>> +-------------------------------------+--------------+-------+
>>>>>> | MEM | IO (regs) | MEM |
>>>>>> +-------------------------------------+--------------+-------+
>>>>>>
>>>>>> Currently, the sparse memory section is 1GB.
>>>>>>
>>>>>> 3GB ddr: the 1.5 ~2G and 3.5 ~ 4G are holes.
>>>>>> 3GB ddr: the 3.5 ~ 4G and 4.5 ~ 5G are holes.
>>>>>>
>>>>>> This will alloc 1G/4K * (struct page) memory for mem_map array.
>>>>>>
>>>>>
>>>>> No, this is incorrect. Sparsemem vmemmap only allocates struct pages
>>>>> for memory regions that are actually populated.
>>>>>
>>>>> For instance, on the Foundation model with 4 GB of memory, you may see
>>>>> something like this in the boot log
>>>>>
>>>>> [ 0.000000] vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000
>>>>> ( 8 GB maximum)
>>>>> [ 0.000000] 0xffffffbdc0000000 - 0xffffffbde2000000
>>>>> ( 544 MB actual)
>>>>>
>>>>> but in reality, only the following regions have been allocated
>>>>>
>>>>> ---[ vmemmap start ]---
>>>>> 0xffffffbdc0000000-0xffffffbdc2000000 32M RW NX SHD AF
>>>>> BLK UXN MEM/NORMAL
>>>>> 0xffffffbde0000000-0xffffffbde2000000 32M RW NX SHD AF
>>>>> BLK UXN MEM/NORMAL
>>>>> ---[ vmemmap end ]---
>>>>>
>>>>> so only 64 MB is used to back 4 GB of RAM with struct pages, which is
>>>>> minimal. Moving to flatmem will not reduce the memory footprint at
>>>>> all.
>>>>
>>>> Yes,but the populate is section, which is 1GB. Take a look at the above
>>>> memory layout.
>>>>
>>>> The section 1G ~ 2G is a section. But 1.5G ~ 2G is a hole.
>>>>
>>>> The section 3G ~ 4G is a section. But 3.5G ~ 4G is a hole.
>>>>>> 0 1.5G 2G 3.5G 4G
>>>>>> | | | | |
>>>>>> +--------------+------+---------------+--------------+
>>>>>> | MEM | hole | MEM | IO (regs) |
>>>>>> +--------------+------+---------------+--------------+
>>>> The hole in 1.5G ~ 2G is also allocated mem-map array. And also with the 3.5G ~ 4G.
>>>>
>>>
>>> No, it is not. It may be covered by a section, but that does not mean
>>> sparsemem vmemmap will actually allocate backing for it. The
>>> granularity used by sparsemem vmemmap on a 4k pages kernel is 128 MB,
>>> due to the fact that the backing is performed at PMD granularity.
>>>
>>> Please, could you share the contents of the vmemmap section in
>>> /sys/kernel/debug/kernel_page_tables of your system running with
>>> sparsemem vmemmap enabled? You will need to set CONFIG_ARM64_PTDUMP=y
>>>
>>
>> Please see the pg-tables below.
>>
>>
>> With sparse and vmemmap enable.
>>
>> ---[ vmemmap start ]---
>> 0xffffffbdc0200000-0xffffffbdc4800000 70M RW NX SHD AF UXN MEM/NORMAL
>> ---[ vmemmap end ]---
>>
>>
>> The board is 4GB, and the memap is 70MB
>> 1G memory --- 14MB mem_map array.
>> So the 4GB has 5 sections, which used 5 * 14MB memory.
>>
>>
> Sorry, 1G memory is 16GB
> 5 sections is 5 * 16 = 80MB
>
> 1G / 4K * (struct page) 64B = 16MB
>
> I don't know why the vmemap dump in pg-tables is 70MB.
>

It may be the PTDUMP code that emits the vmemmap start marker
incorrectly. Could you please double check?

> I add hack code in vmemmap_populate sparse_mem_map_populate.
>
> here is the log:
> sparse_mem_map_populate 188 start ffffffbdc0000000 end ffffffbdc1000000 PAGES_PER_SECTION 40000 nid 0
> vmemmap_populate 549 size 200000 total 200000 addr ffffffbdc0000000
> vmemmap_populate 549 size 200000 total 400000 addr ffffffbdc0200000
> vmemmap_populate 549 size 200000 total 600000 addr ffffffbdc0400000
> vmemmap_populate 549 size 200000 total 800000 addr ffffffbdc0600000
> vmemmap_populate 549 size 200000 total a00000 addr ffffffbdc0800000
> vmemmap_populate 549 size 200000 total c00000 addr ffffffbdc0a00000
> vmemmap_populate 549 size 200000 total e00000 addr ffffffbdc0c00000
> vmemmap_populate 549 size 200000 total 1000000 addr ffffffbdc0e00000
> sparse_mem_map_populate 188 start ffffffbdc1000000 end ffffffbdc2000000 PAGES_PER_SECTION 40000 nid 0
> ...
> sparse_mem_map_populate 188 start ffffffbdc2000000 end ffffffbdc3000000 PAGES_PER_SECTION 40000 nid 0
> sparse_mem_map_populate 188 start ffffffbdc3000000 end ffffffbdc4000000 PAGES_PER_SECTION 40000 nid 0
> sparse_mem_map_populate 188 start ffffffbdc4000000 end ffffffbdc5000000 PAGES_PER_SECTION 40000 nid 0
>
>
> With 4GB memory, it allocated 2MB * 8 * 5 = 80MB.
>> 0 3.5G 4G 4.5G
>> | | | |
>> +-------------------------------------+--------------+-------+
>> | MEM | IO (regs) | MEM |
>> +-------------------------------------+--------------+-------+
>
> 4GB memory ,5 sections. 80MB mem_map allocated.
>

I suppose using

#define SECTION_SIZE_BITS 29

in arch/arm64/include/asm/sparsemem.h would get rid of the overhead
completely in this particular case. Could you confirm, please?

@Will: is the rationale for the default value of 30 for
SECTION_SIZE_BITS documented anywhere? Compared to other
architectures, it seems on the high side, but I did notice that 64k
granule kernels require at least 28 in order not to trigger the
following assertion

include/linux/mmzone.h:1029:2: error: #error Allocator MAX_ORDER
exceeds SECTION_SIZE