Re: [PATCH v3] x86/e820: Fix handling of subpage regions when calculating nosave ranges

From: Myrrh Periwinkle
Date: Sun Apr 06 2025 - 21:13:00 EST


On 4/7/25 07:53, Myrrh Periwinkle wrote:

On 4/7/25 01:36, Ingo Molnar wrote:

* Ingo Molnar<mingo@xxxxxxxxxx> wrote:

* Myrrh Periwinkle<myrrhperiwinkle@xxxxxxxxxxx> wrote:

The current implementation of e820__register_nosave_regions suffers from
multiple serious issues:
  - The end of last region is tracked by PFN, causing it to find holes
    that aren't there if two consecutive subpage regions are present
  - The nosave PFN ranges derived from holes are rounded out (instead of
    rounded in) which makes it inconsistent with how explicitly reserved
    regions are handled

Fix this by:
  - Treating reserved regions as if they were holes, to ensure consistent
    handling (rounding out nosave PFN ranges is more correct as the
    kernel does not use partial pages)
  - Tracking the end of the last RAM region by address instead of pages
    to detect holes more precisely

Cc:stable@xxxxxxxxxxxxxxx
Fixes: e5540f875404 ("x86/boot/e820: Consolidate 'struct e820_entry *entry' local variable names")
So why is this SHA1 indicated as the root cause? AFAICS that commit
does nothing but cleanups, so it cannot cause such regressions.
BTW.:

  A) "It was the first random commit that seemed related, sry"
  B) "It's a 15 years old bug, but I wanted to indicate a fresh, 8-year old bug to get this into -stable. Busted!"

You got me :) How did you know that this is a 15 years old bug? (although I didn't think the age of the bug a patch fixes would affect its chances of getting to -stable)

This specific revision was picked since it's the latest one that this patch can be straightforwardly applied to (there is a (trivial) merge conflict with -stable, though).

Later, I managed to track the buggy logic back to 1c10070a55a3 ("i386: do not restore reserved memory after hibernation"), which I believe is the very first occurrence of this bug. If you prefer, I can send a v4 with a more correct Fixes: tag (or feel free to do so yourself when applying this patch).

I did some more digging and it seems like the buggy logic actually appeared all the way back in e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64") back when x86_64 was a separate port, which was copied later by the i386 port in the commit I mentioned above, which would make this a 19 year old bug instead of 15.


... are perfectly fine answers in my book. :-)

I'm glad about the fixes, I'm just curious how the Fixes tag came about.

Thanks,

    Ingo

Regards,

Myrrh