Re: [PATCH] x86/boot: Reorganize and clean up the BIOS area reservation code

From: Andy Lutomirski
Date: Mon Jul 25 2016 - 20:42:01 EST

On Fri, Jul 22, 2016 at 6:00 AM, Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, 21 Jul, at 03:45:14PM, Andy Lutomirski wrote:
>> I looked at the code some more. The boot services quirk is weird and
>> maybe buggy. trim_snb_memory uses memblock_reserve to reserve the
>> bottom 1MB. If efi_reserve_real_mode has already reserved that range,
>> then trim_snb_memory's reservation will have no effect because the efi
>> code will just free it later on. The same issue will hit any code
>> that reserves >1MB memory after efi has tried to temporarily reserve
>> it.
> Yeah, that looks like a bug. memblock_reserve() reference counting,
> anyone?
>> I don't have any great suggestions for cleaning it up. Perhaps the
>> efi code should instead skip adding boot services memory to the memory
>> map in the first place and then add it late and hand any unreserved
>> bits to the buddy allocator?
> The issue is that some data required at runtime may be contained in
> those boot services data regions; the EFI System Resource Table is a
> good example or the ACPI BGRT table. esrt_init() happens pretty early
> but efi_bgrt_init() is really late in boot because we need the ACPI
> subsystem to have been brought up.

I still think my suggestion works. Let me clarify it:

The memblock allocator (AFAICT) has separate tracking of ranges that
exist and ranges that are reserved. That is, there are four possible
states a range can be in:

existing, non-reserved: these are available for use
existing, reserved: these ranges are present but either in use or blacklisted
non-existent, reserved: not present but blacklisted anyway
non-existent, non-reserved: nothing here

Currently, boot services data is marked as existing (because the e820
code thinks it's real memory) and reserved (because the EFI code
reserves it).

I'm proposing that it work the other way around: the EFI and e820 code
should, during early boot, treat it just like runtime data, reserved
space, or non-present space: simply don't add it to the memory map in
the first place. That will cause it to be non-existent and
non-reserved. Nothing will clobber it because it's not available to
the memblock allocator.

Then, in late boot, either add it back in to the memblock allocator.
Then blacklisted portions will be reserved and non-blacklisted
portions will be non-reserved. If this is after we switch from
memblock to the normal page allocator, then the code will have to be
structured differently, but the same concept applies.

IOW, just pretend that the boot services memory is initially not
present and then treat it as hot-added memory after SVAM is done.