Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
From: Ard Biesheuvel
Date: Fri Apr 10 2020 - 11:26:48 EST
On Fri, 10 Apr 2020 at 16:48, Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
>
> On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> > On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > > (add Peter, Leif and Daniel)
> > > > >
> > > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > > Commit
> > > > > > >
> > > > > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > > bzImage")
> > > > > > >
> > > > > > > removed the .bss section from the bzImage.
> > > > > > >
> > > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > > the bootloader prior to calling the handover entry.
> > > > > > >
> > > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > > executable that contains a small stub loader from systemd together with
> > > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > > been allocated.
> > > > > >
> > > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > > > > with early efi failure. With these two patches applied, it works well
> > > > > > then.
> > > > > >
> > > > > > BTW, I use Fedora 31 + Grub2
> > > > > >
> > > > >
> > > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > > it exceeds the file size?
> > > >
> > > > Dave, that comment was because the previous report was for systemd-boot
> > > > stub.
> > > >
> > > > Ard, should I revise the commit message to make it clear it's not
> > > > restricted to systemd-boot but anything using handover entry may be
> > > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > > line to say grub2 with the EFI stub patches is also impacted?
> > > >
> > >
> > > Well, the fact the /some/ piece of software is used in production that
> > > relies on the ill-defined EFI handover protocol is sufficient
> > > justification, so I don't think it is hugely important to update it.
> > >
> > > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > > >
> > > > + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > > + BYTES_TO_PAGES(lh.init_size));
> > > >
> > > > Looking at this, grub does allocate init_size for the image, but it
> > > > doesn't zero it out.
> > > >
> > > > This call also looks wrong to me though. It allocates at max address of
> > > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > > would cause the startup code to relocate the kernel into unallocated
> > > > memory. On a mixed-mode boot, this would cause the early page tables
> > > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > > and potentially get clobbered by the EFI stub.
> > > >
> > > > The first try to allocate pref_address should be calling
> > > > grub_efi_allocate_fixed instead.
> > >
> > > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > > to get these logged somewhere.
> >
> > Ok. For dracut, the process for building the unified kernel image needs
> > a check to make sure the kernel can fit in the space provided for it --
> > there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> > size, so there's some slack left at present.
> >
> > Additionally, in mixed-mode, the unified kernel images are quite likely
> > to end up with early pgtables from startup_32 clobbering the initrd,
> > independently of the recent kernel changes. Hopefully no-one actually
> > uses these in mixed-mode.
>
> The grub EFI handover entry patch is busted in mixed-mode for another
> reason -- while it allocates init_size, it doesn't use the correct
> alignment. I tested on a Debian buster VM in mixed-mode (that was the
> one I was able to get to install/boot with mixed-mode), and the early
> pagetable from startup_32 ends up in unallocated memory due to the
> rounding up of the bzImage address to account for kernel alignment. This
> would be an existing problem prior to these patches.
>
> Should we try to handle this in the kernel? At some point KASLR is going
> to pick that memory for the kernel and overwrite the pagetables I would
> think, resulting in sporadic crashes that are almost unreproducible.
Upstream GRUB does not implement the EFI handover protocol at all, and
the distros all have their own GRUB forks that implement this along
with mixed mode, secure boot, shim, measured boot etc.
What you are saying is that GRUB forks turn out to exist that violate
both the PE/COFF specification and the Linux/x86 boot protocol in a
way that might break mixed mode, and nobody noticed until you happened
to find it by code inspection. While I appreciate the effort, I think
this is where I would like to draw the line, and say that there is
only so much we can do to work around bugs in out-of-tree forks of
other projects. So unless it can be done cleanly and without losing
any of the benefits of the recent cleanup and optimization work, I'd
say don't bother.