Re: kernel panic due to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2830bf6f05fb3e05bc4743274b806c821807a684

From: Mikhail Gavrilov
Date: Mon Jan 28 2019 - 01:37:15 EST


> Linus, could you take the revert please?
>
> From 817b18d3db36a6900ca9043af8c1416c56358be3 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxxx>
> Date: Fri, 25 Jan 2019 19:08:58 +0100
> Subject: [PATCH] Revert "mm, memory_hotplug: initialize struct pages for the
> full memory section"
>
> This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684.
>
> The underlying assumption that one sparse section belongs into a single
> numa node doesn't hold really. Robert Shteynfeld has reported a boot
> failure. The boot log was not captured but his memory layout is as
> follows:
> [ 0.286954] Early memory node ranges
> [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff]
> [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
> [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff]
> [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff]
>
> This means that node0 starts in the middle of a memory section which is
> also in node1. memmap_init_zone tries to initialize padding of a section
> even when it is outside of the given pfn range because there are code
> paths (e.g. memory hotplug) which assume that the full worth of memory
> section is always initialized. In this particular case, though, such a
> range is already intialized and most likely already managed by the page
> allocator. Scribbling over those pages corrupts the internal state and
> likely blows up when any of those pages gets used.
>
> Reported-by: Robert Shteynfeld <robert.shteynfeld@xxxxxxxxx>
> Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
> Cc: stable
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> mm/page_alloc.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d295c9bc01a8..35fdde041f5c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5701,18 +5701,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> cond_resched();
> }
> }
> -#ifdef CONFIG_SPARSEMEM
> - /*
> - * If the zone does not span the rest of the section then
> - * we should at least initialize those pages. Otherwise we
> - * could blow up on a poisoned page in some paths which depend
> - * on full sections being initialized (e.g. memory hotplug).
> - */
> - while (end_pfn % PAGES_PER_SECTION) {
> - __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
> - end_pfn++;
> - }
> -#endif
> }
>
> #ifdef CONFIG_ZONE_DEVICE

Michal, I suppose that revert the commit
2830bf6f05fb3e05bc4743274b806c821807a68 are return my issue
https://marc.info/?l=linux-mm&m=154499704718428
Are any other better approach would be proposed for fixing my issue?

--
Best Regards,
Mike Gavrilov.