On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
Commit c1cc1552616d ("arm64: MMU initialisation")
optimizes the vmemmap to populate at the PMD section level. However, if
start or end is not aligned to a section boundary, such as when a
subsection is hot added, populating the entire section is wasteful. For
instance, if only one subsection hot-added, the entire section's struct
page metadata will still be populated.In such cases, it is more effective
to populate at page granularity.
OK, so from the vmemmap perspective, we waste up to 2MB memory that has
been allocated even if a 2MB hot-plugged subsection required only 32KB
of struct page. I don't mind this much really. I hope all those
subsections are not scattered around to amplify this waste.
This change also addresses mismatch issues during vmemmap_free(): When
pmd_sect() is true, the entire PMD section is cleared, even if there is
other effective subsection. For example, pagemap1 and pagemap2 are part
of a single PMD entry and they are hot-added sequentially. Then pagemap1
is removed, vmemmap_free() will clear the entire PMD entry, freeing the
struct page metadata for the whole section, even though pagemap2 is still
active.
I think that's the bigger issue. We can't unplug a subsection only.
Looking at unmap_hotplug_pmd_range(), it frees a 2MB vmemmap section but
that may hold struct page for the equivalent of 128MB of memory. So any
struct page accesses for the other subsections will fault.
Fixes: c1cc1552616d ("arm64: MMU initialisation")
I wouldn't add a fix for the first commit adding arm64 support, we did
not even have memory hotplug at the time (added later in 5.7 by commit
bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
sub-section hotplug"). That commit broke some arm64 assumptions.
Signed-off-by: Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx>
---
arch/arm64/mm/mmu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e2739b69e11b..fd59ee44960e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ !IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
+ !IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
nuking the whole vmemmap pmd section if it's not empty. Not sure how
easy that is, whether we have the necessary information (I haven't
looked in detail).
A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
that's possible, the problem isn't solved by this patch.