Re: PLEASE REVERT URGENTLY: Re: [PATCH v5 2/3] x86/boot: add acpi rsdp address to setup_header

From: H. Peter Anvin
Date: Sun Nov 11 2018 - 13:50:25 EST


On 11/10/18 1:03 AM, Juergen Gross wrote:
>
> How would that help? The garabge data written could have the correct
> terminal sentinel value by chance.
>
> That's why I re-used an existing field in setup_header (the version) to
> let grub tell the kernel which part of setup_header was written by grub.
>
> That's the only way I could find to let the kernel distinguish between
> garbage and actual data.

There is plenty of space *before* the setup_header part of struct boot_params
too -- look a the various __pad fields, especially (in your case), __pad3[16]
and __pad4[116] would suit the bill just fine.

>> It would be enormously helpful if you could find out any more details about
>> exactly what they are doing to break things.
>
> That's easy:
>
> The memory layout is:
>
> 0x1f1 bytes of data, including the sentinel, the setup_header, and then
> more data.
>
> grub did read the kernel's setup_header in the correct size into its
> buffer (which contains random garbage before that), intializes the first
> 0x1f1 including the sentinel byte, and then writes back the buffer, but
> using a too large length for that.

Are you kidding me... it really overwrites it with completely random data, and
not simply overspilling contents of the file?

In that case it might not be possible (or desirable) to use those N bytes
following the setup_heaader, or we need to a bigger sentinel than one byte
(probability being what it is, 256^n gets to be a pretty big number for any n,
very quickly drowning in the noise compared to other potential sources of boot
failures, and most likely less fatal than most.)

How big is this garbage dump? I'm going to brave a guess it is 512 bytes. In
that case this is hardly a big deal: the E820 map begins at 0x290, and the
setup_header maximum goes to 0x280, so it is only 15 bytes lost. If it is
worse than that, we would risk losing __pad8[48] and __pad9[276], and
especially the latter would be painful. In those case perhaps we should use
0x281..0x290 as a 15-byte sentinel; that is going to be virtually foolproof.

I'm also thinking that it might be desirable to add a flags field (__pad2
would be ideal) to struct boot_params; it would let us recycle some of the
obsolete fields (hd0_info, hd1_info, sys_desc_table, olpc_ofw_header, ...) and
perhaps be able to add some more robustness against these sort of things. This
would be the right way to do what your version feedback mechanism would do.

The reason why the feedback mechanism is fundamentally broken is that it only
gives the boot loader a way to assert that it supports a certain version of
the protocol, but it doesn't say *which* bootloader does such an assert -- and
therefore it is still wide open to implementation error.

We do, in fact, already have a feedback mechanism: the bootloader ID and
bootloader version. One way we could deal with this problem is to bump the
bootloader version reported by Grub, and add a quirk to the kernel that if the
bootloader ID is Grub (7) and the version is less than a certain number, zero
those fields. In fact, the more I think about it, this is what we should do.

That being said, Grub really needs to be kept honest. They keep making both
severe design (like refusing to use the BIOS and UEFI entry points provided by
the kernel by default) and implementation errors, apparently without
meaningful oversight. I really ask that the people more closely involved with
Grub try to keep a closer eye on their code as it applies to Linux.

-hpa