Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

From: Arvind Sankar
Date: Sun Apr 05 2020 - 23:55:04 EST


On Mon, Apr 06, 2020 at 07:00:39AM +0700, Sergey Shatunov wrote:
> On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> > I'm not familiar with systemd-boot: when you say systemd-boot stub,
> > is
> > that something different from the kernel's EFI_STUB option? Or is it
> > just a kernel with EFI_STUB enabled and with builtin initramfs +
> > builtin
> > cmdline?
> Basicaly systemd-boot stub is efi application with packed EFI_STUB-
> enabled kernel, initrd and cmdline into single file. Source can be
> found here:
> https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c
>
> It doesn't do anything unusual, just extracting data from sections and
> calling efi handover.
>
> Final image created by objcopy'ing precompiled stub and adding sections with that stuff:
>
> objcopy \
> --add-section .osrel=os_release --change-section-vma
> '.osrel=0x20000' \
> --add-section .cmdline=cmdline --change-section-vma
> '.cmdline=0x30000' \
> --add-section .linux=vmlinuz --change-section-vma
> '.linux=0x2000000' \
> --add-section .initrd=initrd --change-section-vma
> '.initrd=0x3000000' \
> /usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi

So this embeds the bzImage which is a PE executable inside another PE
executable. Before my patch, the bss section was explicitly part of the
bzImage and so would have been zero, now it isn't any more and the PE
loader is expected to zero it out before executing. systemd-boot's stub
loader doesn't do that prior to jumping to the EFI handover entry, so
the issue must be because bss contains garbage. I'm not 100% sure why
that leads to a crash, as the only variables in bss in the EFI stub are
for some boolean EFI command line arguments, so it ought to still have
worked, just as though it was invoked with random arguments. Anyway we
need to handle an uninitialized bss to get this to work properly.

I also see from systemd [0] and dracut source [1] that these VMA's seem
to be hardcoded with no checking for how big the files actually are, and
objcopy doesn't seem to complain if sections end up overlapping.

So since [2] in dracut, the space available for the .linux section
containing the bzImage shrank from ~48MiB to 16MiB. This will hopefully
still fit the compressed kernel (although an allyesconfig bzImage is far
bigger than even 48MiB), but in-place decompression is unlikely to be
possible even for a normal config, which will break another patchset
that got merged into mainline for 5.7 [3,4], which tries to avoid
copying the kernel unless necessary, and has a good chance of triggering
in-place decompression if kaslr is disabled.

I'll get systemd-boot installed here so I can reproduce and implement
some workarounds for both issues. I should hopefully have a fix in a day
or two.

[0] https://github.com/systemd/systemd/blob/9fac14980df8dcce922e1fe8856a88b09590d2c3/test/test-efi-create-disk.sh#L30
[1] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/tree/dracut.sh#n2039
[2] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=4237aeb040c276722b528001bdea31e6eb994d06
[3] https://lore.kernel.org/linux-efi/20200303221205.4048668-1-nivedita@xxxxxxxxxxxx/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5cdf4cfeac914617ca22866bd4685fd7f876dec