Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()

From: David Hildenbrand
Date: Tue May 26 2020 - 07:49:47 EST

On 26.05.20 13:32, Mike Rapoport wrote:
> Hello Baoquan,
> On Tue, May 26, 2020 at 04:45:43PM +0800, Baoquan He wrote:
>> On 05/22/20 at 05:20pm, Mike Rapoport wrote:
>>> Hello Baoquan,
>>> On Fri, May 22, 2020 at 03:25:24PM +0800, Baoquan He wrote:
>>>> On 05/22/20 at 03:01pm, Baoquan He wrote:
>>>>> So let's add these unavailable ranges into memblock and reserve them
>>>>> in init_unavailable_range() instead. With this change, they will be added
>>>>> into appropriate node and zone in memmap_init(), and initialized in
>>>>> reserve_bootmem_region() just like any other memblock reserved regions.
>>>> Seems this is not right. They can't get nid in init_unavailable_range().
>>>> Adding e820 ranges may let them get nid. But the hole range won't be
>>>> added to memblock, and still has the issue.
>>>> Nack this one for now, still considering.
>>> Why won't we add the e820 reserved ranges to memblock.memory during
>>> early boot as I suggested?
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index c5399e80c59c..b0940c618ed9 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1301,8 +1301,11 @@ void __init e820__memblock_setup(void)
>>> if (end != (resource_size_t)end)
>>> continue;
>>> - if (entry->type == E820_TYPE_SOFT_RESERVED)
>>> + if (entry->type == E820_TYPE_SOFT_RESERVED ||
>>> + entry->type == E820_TYPE_RESERVED) {
>>> + memblock_add(entry->addr, entry->size);
>>> memblock_reserve(entry->addr, entry->size);
>>> + }
>>> if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
>>> continue;
>>> The setting of node later in numa_init() will assign the proper node
>>> for these regions as it does for the usable memory.
>> Yes, if it's only related to e820 reserved region, this truly works.
>> However, it also has ACPI table regions. That's why I changed to call
>> the problematic area as firmware reserved ranges later.
>> Bisides, you can see below line, there's another reserved region which only
>> occupies one page in one memory seciton. If adding to memblock.memory, we also
>> will build struct mem_section and the relevant struct pages for the whole
>> section. And then the holes around that page will be added and initialized in
>> init_unavailable_mem(). numa_init() will assign proper node for memblock.memory
>> and memblock.reserved, but won't assign proper node for the holes.
>> ~~~
>> [ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved
>> ~~~
>> So I still think we should not add firmware reserved range into
>> memblock for fixing this issue.
>> And, the fix in the original patch seems necessary. You can see in
>> compaction code, the migration source is chosen from LRU pages or
>> movable pages, the migration target has to be got from Buddy. However,
>> only the min_pfn in fast_isolate_freepages(), it's calculated by
>> distance between cc->free_pfn - cc->migrate_pfn, we can't guarantee it's
>> safe, then use it as the target to handle.
> I do not object to your original fix with careful check for pfn validity.
> But I still think that the memory reserved by the firmware is still
> memory and it should be added to memblock.memory. This way the memory

If it's really memory that could be read/written, I think I agree.

> map will be properly initialized from the very beginning and we won't
> need init_unavailable_mem() and alike workarounds and. Obviously, the patch

I remember init_unavailable_mem() is necessary for holes within
sections, where we actually *don't* have memory, but we still have have
a valid memmap (full section) that we have to initialize.

See the example from 4b094b7851bf ("mm/page_alloc.c: initialize memmap
of unavailable memory directly"). Our main memory ends within a section,
so we have to initialize the remaining parts because the whole section
will be marked valid/online.

Any way to improve this handling is appreciated. In that patch I also
spelled out that we might want to mark such holes via a new page type,
e.g., PageHole(). Such a page is a memory hole, but has a valid memmap.
Any content in the memmap (zone/node) should be ignored.

But it's all quite confusing, especially across architectures and ...

> above is not enough, but it's a small step in this direction.
> I believe that improving the early memory initialization would make many
> things simpler and more robust, but that's a different story :)

... I second that.


David / dhildenb