[tip: x86/urgent] x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
From: tip-bot2 for Myrrh Periwinkle
Date: Mon Apr 07 2025 - 13:31:39 EST
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: f2f29da9f0d4367f6ff35e0d9d021257bb53e273
Gitweb: https://git.kernel.org/tip/f2f29da9f0d4367f6ff35e0d9d021257bb53e273
Author: Myrrh Periwinkle <myrrhperiwinkle@xxxxxxxxxxx>
AuthorDate: Sun, 06 Apr 2025 11:45:22 +07:00
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Mon, 07 Apr 2025 19:20:08 +02:00
x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io@xxxxxxxxxx>
Tested-by: Roberto Ricci <io@xxxxxxxxxx>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle@xxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: David Woodhouse <dwmw@xxxxxxxxxxxx>
Cc: Len Brown <len.brown@xxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@xxxxxxxxxxx
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
---
arch/x86/kernel/e820.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0..9d8dd8d 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -753,22 +753,21 @@ void __init e820__memory_setup_extended(u64 phys_addr, u32 data_len)
void __init e820__register_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ u64 last_addr = 0;
for (i = 0; i < e820_table->nr_entries; i++) {
struct e820_entry *entry = &e820_table->entries[i];
- if (pfn < PFN_UP(entry->addr))
- register_nosave_region(pfn, PFN_UP(entry->addr));
-
- pfn = PFN_DOWN(entry->addr + entry->size);
-
if (entry->type != E820_TYPE_RAM)
- register_nosave_region(PFN_UP(entry->addr), pfn);
+ continue;
- if (pfn >= limit_pfn)
- break;
+ if (last_addr < entry->addr)
+ register_nosave_region(PFN_DOWN(last_addr), PFN_UP(entry->addr));
+
+ last_addr = entry->addr + entry->size;
}
+
+ register_nosave_region(PFN_DOWN(last_addr), limit_pfn);
}
#ifdef CONFIG_ACPI