Re: [tip: x86/urgent] x86/setup: Always reserve the first 1M of RAM
From: Andy Lutomirski
Date: Wed Mar 01 2023 - 22:52:11 EST
On Thu, Jun 3, 2021, at 11:01 AM, tip-bot2 for Mike Rapoport wrote:
> The following commit has been merged into the x86/urgent branch of tip:
>
> Commit-ID: f1d4d47c5851b348b7713007e152bc68b94d728b
> Gitweb:
> https://git.kernel.org/tip/f1d4d47c5851b348b7713007e152bc68b94d728b
> Author: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> AuthorDate: Tue, 01 Jun 2021 10:53:52 +03:00
> Committer: Borislav Petkov <bp@xxxxxxx>
> CommitterDate: Thu, 03 Jun 2021 19:57:55 +02:00
>
> x86/setup: Always reserve the first 1M of RAM
>
> There are BIOSes that are known to corrupt the memory under 1M, or more
> precisely under 640K because the memory above 640K is anyway reserved
> for the EGA/VGA frame buffer and BIOS.
>
> To prevent usage of the memory that will be potentially clobbered by the
> kernel, the beginning of the memory is always reserved. The exact size
> of the reserved area is determined by CONFIG_X86_RESERVE_LOW build time
> and the "reservelow=" command line option. The reserved range may be
> from 4K to 640K with the default of 64K. There are also configurations
> that reserve the entire 1M range, like machines with SandyBridge graphic
> devices or systems that enable crash kernel.
>
> In addition to the potentially clobbered memory, EBDA of unknown size may
> be as low as 128K and the memory above that EBDA start is also reserved
> early.
>
> It would have been possible to reserve the entire range under 1M unless for
> the real mode trampoline that must reside in that area.
>
> To accommodate placement of the real mode trampoline and keep the memory
> safe from being clobbered by BIOS, reserve the first 64K of RAM before
> memory allocations are possible and then, after the real mode trampoline
> is allocated, reserve the entire range from 0 to 1M.
>
> Update trim_snb_memory() and reserve_real_mode() to avoid redundant
> reservations of the same memory range.
>
> Also make sure the memory under 1M is not getting freed by
> efi_free_boot_services().
This is quite broken. The comments in the patch seem to understand that Linux tries twice to allocate the real mode trampoline, but the code has some issues.
First, it actively breaks the logic here:
+ /*
+ * Don't free memory under 1M for two reasons:
+ * - BIOS might clobber it
+ * - Crash kernel needs it to be reserved
+ */
+ if (start + size < SZ_1M)
+ continue;
+ if (start < SZ_1M) {
+ size -= (SZ_1M - start);
+ start = SZ_1M;
+ }
+
The whole point is that, if we fail to allocate a trampoline, we free boot services and try again. But if we can't free boot services below 1M, then we can't allocate a trampoline in boot services memory. And then it does:
+ /*
+ * Unconditionally reserve the entire fisrt 1M, see comment in
+ * setup_arch().
+ */
+ memblock_reserve(0, SZ_1M);
But this runs even if we just failed to allocate a trampoline on the first try, again dooming the kernel to panic.
I real the commit message and the linked bug, and I'm having trouble finding evidence of anything actually fixed by this patch. Can we just revert it? If not, it would be nice to get a fixup patch that genuinely cleans this up -- the whole structure of the code (first, try to allocate trampoline, then free boot services, then try again) isn't really conducive to a model where we *don't* free boot services < 1M.
Discovered by my delightful laptop, which does not boot with this patch applied.
--Andy