Re: [PATCH 7/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

From: Wei Yang
Date: Sun Feb 09 2020 - 18:52:16 EST


On Sun, Feb 09, 2020 at 06:48:26PM +0800, Baoquan He wrote:
>In section_deactivate(), pfn_to_page() doesn't work any more after
>ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
>It caused hot remove failure, the trace is:
>
>kernel BUG at mm/page_alloc.c:4806!
>invalid opcode: 0000 [#1] SMP PTI
>CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
>Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>RIP: 0010:free_pages+0x85/0xa0
>Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
>Let's defer the ->section_mem_map resetting after depopulate_section_memmap()
>to fix it.
>
>Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
>---
> mm/sparse.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/mm/sparse.c b/mm/sparse.c
>index 623755e88255..345d065ef6ce 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -854,13 +854,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>- ms->section_mem_map = (unsigned long)NULL;
> }
>
> if (section_is_early && memmap)
> free_map_bootmem(memmap);
> else
> depopulate_section_memmap(pfn, nr_pages, altmap);

The crash happens in depopulate_section_memmap() when trying to get memmap by
pfn_to_page(). Can we pass memmap directly?

>+
>+ if(!rc)
>+ ms->section_mem_map = (unsigned long)NULL;
> }
>
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>--
>2.17.2

--
Wei Yang
Help you, Help me