Re: [PATCH] x86/boot: Reorganize and clean up the BIOS area reservation code

From: Andy Lutomirski
Date: Thu Jul 21 2016 - 10:58:43 EST


On Jul 21, 2016 1:14 AM, "Ingo Molnar" <mingo@xxxxxxxxxx> wrote:
>
>
> * Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> > Under some conditions, my Dell XPS 13 9350 puts the EBDA at 0x2c000
> > but reports the lowmem cutoff as 0. The old code reserves
> > everything above 0x2c000 and I can't boot [1].
>
> > [1] This only breaks boot in practice when some other firmware or
> > GRUB oddity that I don't fully understand kicks in causing the
> > memory below 0x2c000 to be unusable.
>
> Exactly why can't Linux boot if *more* memory is reserved? Is it perhaps the SMP
> trampoline that cannot be allocated?

Yes, exactly.

>
> Is the boot failure deterministic - if yes, could you try to dig a bit more into
> this?

It's mostly deterministic. I hit it every time if I use Dell's latest
BIOS (1.4.4), enable SGX in BIOS (no SGX kernel patches involved), and
boot using Fedora's grub2-efi on the hard disk. I don't hit it on a
USB stick or if I boot using the EFI stub via the EFI shell. Using
EFI shell causes 1000-27fff to be conventional memory instead of boot
data -- see below.

Here's my memory map:

[ 0.000000] efi: mem00: [Runtime Data |RUN| | | | | | | |WB|WT|
WC|UC] range=[0x0000000000000000-0x0000000000000fff] (0MB)
[ 0.000000] efi: mem01: [Boot Data | | | | | | | |
|WB|WT|WC|UC] range=[0x0000000000001000-0x0000000000027fff] (0MB)
[ 0.000000] efi: mem02: [Loader Data | | | | | | | |
|WB|WT|WC|UC] range=[0x0000000000028000-0x0000000000029fff] (0MB)
[ 0.000000] efi: mem03: [Reserved | | | | | | | |
|WB|WT|WC|UC] range=[0x000000000002a000-0x000000000002bfff] (0MB)
[ 0.000000] efi: mem04: [Runtime Data |RUN| | | | | | |
|WB|WT|WC|UC] range=[0x000000000002c000-0x000000000002cfff] (0MB)
[ 0.000000] efi: mem05: [Loader Data | | | | | | | |
|WB|WT|WC|UC] range=[0x000000000002d000-0x000000000002dfff] (0MB)
[ 0.000000] efi: mem06: [Conventional Memory| | | | | | | |
|WB|WT|WC|UC] range=[0x000000000002e000-0x0000000000057fff] (0MB)
[ 0.000000] efi: mem07: [Reserved | | | | | | | |
|WB|WT|WC|UC] range=[0x0000000000058000-0x0000000000058fff] (0MB)
[ 0.000000] efi: mem08: [Conventional Memory| | | | | | | |
|WB|WT|WC|UC] range=[0x0000000000059000-0x000000000009ffff] (0MB)
[

The EFI quirk to reserve boot data kills 1000-27fff. The EBDA
reservation code kills the rest, leaving no <1MB memory at all.

>
> My guess it's the SMP trampoline, and I think we should robustify that in a
> different way: lets put it aside very early as a reservation (possibly in this
> very function), to guarantee that we have a below 1MB buffer for the SMP
> trampoline. This would be a lot more robust ...
>

If we really want to robustify that, I would suggest that we change
the way that the trampoline works. In particular, I don't see any
reason why we need to call setup_real_mode until we're actually ready
to initialize APs, and we should be done with the boot services data
quirk by then (am I right, Matt?). So if we can get the allocation
code right, we shouldn't have any problem putting the trampoline in
the boot services range.

It would be very easy to implement this if we could handle overlapping
memblocks precisely or set a lower limit on the memblock allocator.
Then we could block off everything below 1MB or 2MB very early and
then unblock it or temporarily change the lower limit and ask for a
single page for the trampoline after that.