Re: Arm64 crash while online/offline memory sections

From: David Hildenbrand
Date: Tue May 25 2021 - 14:12:30 EST


On 25.05.21 20:00, Oscar Salvador wrote:
On Tue, May 25, 2021 at 05:57:34PM +0000, Qian Cai (QUIC) wrote:
Do we know which patch in particular is problematic?

Okay, the winner is "mm,memory_hotplug: Allocate memmap from the added memory range".

https://lore.kernel.org/linux-mm/20210421102701.25051-5-osalvador@xxxxxxx/

Ok, which means that is irrelevant to having it enabled, as the latter
patch of that series actualy enables it for arm64.
Can you work out where exactly the crash happens?

I will have a look into it tomorrow.

Thanks for reporting.


I assume the following will work:

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b31b3af5c490..6e661d106e96 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -218,14 +218,15 @@ static int memory_block_offline(struct memory_block *mem)
struct zone *zone;
int ret;
- zone = page_zone(pfn_to_page(start_pfn));
-
/*
* Unaccount before offlining, such that unpopulated zone and kthreads
* can properly be torn down in offline_pages().
*/
- if (nr_vmemmap_pages)
+ if (nr_vmemmap_pages) {
+ /* Hotplugged memory has no holes. */
+ zone = page_zone(pfn_to_page(start_pfn));
adjust_present_page_count(zone, -nr_vmemmap_pages);
+ }
ret = offline_pages(start_pfn + nr_vmemmap_pages,
nr_pages - nr_vmemmap_pages);


We must not touch pfn_to_page(start_pfn) if it might be a memory hole.
offline_pages() will make sure there are no holes, but that's too late.

--
Thanks,

David / dhildenb