Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
From: Ard Biesheuvel
Date: Mon Apr 06 2020 - 12:22:47 EST
On Mon, 6 Apr 2020 at 18:01, Arvind Sankar <nivedita@xxxxxxxxxxxx> wrote:
>
> On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > > >
> > > > > Actually, it may be sufficient to #define __efistub_global to
> > > > > __section(.data) like we already do for ARM, to ensure that these
> > > > > global flags are always initialized correctly. (I'll wait for Sergey
> > > > > to confirm that the spurious enabling of the PCI DMA protection
> > > > > resulting from this BSS issue is causing the boot regression)
> > >
> > > Yeah I thought of that as the easiest fix, but it might be safer to
> > > explicitly zero-init in efi_main to avoid future problems in case
> > > someone adds another variable in bss and isn't aware of this obscure
> > > requirement. We actually already have sys_table in bss, but that one is
> > > always initialized. There's also other globals that aren't annotated
> > > (but not in bss by virtue of having initializers). What do you think?
> > >
> >
> > *If* we zero init BSS, I'd prefer for it to be done in the EFI
> > handover protocol entrypoint only. But does that fix the issue that
> > BSS lives outside of the memory footprint of the kernel image?
> >
>
> Yes, I was thinking of only doing it if we didn't come through the
> pe_entry. We could also avoid re-parsing the command line if we add a
> global flag to indicate that.
>
Yeah. I was trying to avoid that, but if we end up needing to
distinguish between the two cases anyway, we might just as well
optimize that too.
> Regarding the memory footprint, if there's no initrd that might be a
> problem, since in that case ImageSize will only cover the actual data
> from the bzImage, so it would be safer to move them to data (including
> sys_table).
>
> We could just pull the stub's bss section all into data with objcopy
> similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> then pull that into .data in the linker script for the compressed
> kernel?
>
I don't follow. I'm not aware of arm64 doing anything out of the
ordinary with .bss? Note that arm64 does not have a separate
decompressor, so the EFI stub is part of the kernel proper. This is
why sections are renamed to start with .init
> There is also this scary looking comment in gnu-efi's linker script:
> /* the EFI loader doesn't seem to like a .bss section, so we stick
> it all into .data: */
> I don't know what the history of that is.
>
I don't remember, to be honest, but I'm pretty sure I copy-pasted that
from elsewhere at the time.
> [1] As an aside, why doesn't ARM do this as well rather than using the
> section(.data) annotation?
>
The ARM decompressor has this hideous hack, where text [and rodata]
executes straight from flash, and BSS is relocated into DRAM. In order
for this to work, it actually *requires* GOT indirections for BSS
items, to ensure that all BSS references use absolute addresses, which
can be relocated independently from the rest of the kernel image. This
is the reason ARM does not permit a .data section in the decompressor.
However, the EFI stub does not support execute in place from flash,
and so we can permit a .data section there. At the same time, we don't
support GOT indirections in the EFI stub, so we cannot use the BSS
section. So instead, we just put the few BSS pieces we have into .data
instead.
> >
> > > What do you think of the other problem -- that's actually worse to fix,
> > > as it won't just be when kaslr is disabled, the startup_64 code will do
> > > relocation to the end of init_size and clobber the initrd before getting
> > > to the kaslr code, so it will break as soon as the firmware loads the
> > > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > > think of is to just unconditionally call efi_relocate_kernel if we were
> > > entered via handover_entry?
> > >
> >
> > Yes, that seems to be the most robust approach.