Re: [PATCH] mm/page_alloc: fix memmap_init_zone pageblock alignment

From: Daniel Vacek
Date: Thu Mar 01 2018 - 10:09:44 EST


ffffe31d01ed8000 7b600000 0 0 0 0
On Thu, Mar 1, 2018 at 2:10 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> On Thu 01-03-18 13:47:45, Daniel Vacek wrote:
>> In move_freepages() a BUG_ON() can be triggered on uninitialized page structures
>> due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the
>> same way as in move_freepages_block() simply fixes those crashes.
>
> This changelog doesn't describe how the fix works. Why doesn't
> memblock_next_valid_pfn return the first valid pfn as one would expect?

Actually it does. The point is it is not guaranteed to be pageblock
aligned. And we
actually want to initialize even those page structures which are
outside of the range.
Hence the alignment here.

For example from reproducer machine, memory map from e820/BIOS:

$ grep 7b7ff000 /proc/iomem
7b7ff000-7b7fffff : System RAM

Page structures before commit b92df1de5d28:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068
uptodate,lru,active,mappedtodisk <<<< start of the range here
fffff73941ee0000 7b800000 0 0 1 1fffff00000000
fffff73941ffffc0 7ffff000 0 0 1 1fffff00000000

So far so good.

After commit b92df1de5d28 machine eventually crashes with:

BUG at mm/page_alloc.c:1913

> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));

>From registers and stack I digged start_page points to
ffffe31d01ed8000 (note that this is
page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffe31d01e00000 78000000 0 0 0 0
ffffe31d01ed7fc0 7b5ff000 0 0 0 0
ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
that nodeid and zonenr are encoded in top bits of page flags which are
not initialized here, hence the crash :-(
ffffe31d01edff80 7b7fe000 0 0 0 0
ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000
ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000
ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000

With my fix applied:

crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
7b800000 7ffff000 80000000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
<<<< vital data filled in here this time \o/
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068
uptodate,lru,active,mappedtodisk
ffffea0001ee0000 7b800000 0 0 1 1fffff00000000
ffffea0001ffffc0 7ffff000 0 0 1 1fffff00000000

We are not interested in the beginning of whole section. Just the
pages in the first
populated block where the range begins are important (actually just
the first one really, but...).


> It would be also good put the panic info in the changelog.

Of course I forgot to link the related bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=196443

Though it is not very well explained there as well. I hope my notes
above make it clear.


>> Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible")
>> Signed-off-by: Daniel Vacek <neelx@xxxxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> ---
>> mm/page_alloc.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index cb416723538f..9edee36e6a74 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>> /*
>> * Skip to the pfn preceding the next valid one (or
>> * end_pfn), such that we hit a valid pfn (or end_pfn)
>> - * on our next iteration of the loop.
>> + * on our next iteration of the loop. Note that it needs
>> + * to be pageblock aligned even when the region itself
>> + * is not as move_freepages_block() can shift ahead of
>> + * the valid region but still depends on correct page
>> + * metadata.
>> */
>> - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
>> + pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
>> + ~(pageblock_nr_pages-1)) - 1;
>> #endif
>> continue;
>> }
>> --
>> 2.16.2
>>
>
> --
> Michal Hocko
> SUSE Labs