Re: [PATCH] x86/build: Fix vmlinux size check on 64-bit

From: Arvind Sankar
Date: Tue Oct 27 2020 - 17:16:27 EST


On Tue, Oct 27, 2020 at 09:08:03PM +0100, Borislav Petkov wrote:
> On Mon, Oct 05, 2020 at 11:15:39AM -0400, Arvind Sankar wrote:
> > Commit b4e0409a36f4 ("x86: check vmlinux limits, 64-bit") added a check
> > that the size of the 64-bit kernel is less than KERNEL_IMAGE_SIZE.
> >
> > The check uses (_end - _text), but this is not enough. The initial PMD
> > used in startup_64() (level2_kernel_pgt) can only map upto
> > KERNEL_IMAGE_SIZE from __START_KERNEL_map, not from _text.
> >
> > The correct check is the same as for 32-bit, since LOAD_OFFSET is
> > defined appropriately for the two architectures. Just check
> > (_end - LOAD_OFFSET) against KERNEL_IMAGE_SIZE unconditionally.
> >
> > Signed-off-by: Arvind Sankar <nivedita@xxxxxxxxxxxx>
> > ---
> > arch/x86/kernel/vmlinux.lds.S | 11 ++---------
> > 1 file changed, 2 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> > index bf9e0adb5b7e..b38832821b98 100644
> > --- a/arch/x86/kernel/vmlinux.lds.S
> > +++ b/arch/x86/kernel/vmlinux.lds.S
> > @@ -454,13 +454,12 @@ SECTIONS
> > ASSERT(SIZEOF(.rela.dyn) == 0, "Unexpected run-time relocations (.rela) detected!")
> > }
> >
> > -#ifdef CONFIG_X86_32
> > /*
> > * The ASSERT() sink to . is intentional, for binutils 2.14 compatibility:
> > */
> > . = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
> > "kernel image bigger than KERNEL_IMAGE_SIZE");
> > -#else
> > +#ifdef CONFIG_X86_64
> > /*
> > * Per-cpu symbols which need to be offset from __per_cpu_load
> > * for the boot processor.
> > @@ -470,18 +469,12 @@ INIT_PER_CPU(gdt_page);
> > INIT_PER_CPU(fixed_percpu_data);
> > INIT_PER_CPU(irq_stack_backing_store);
> >
> > -/*
> > - * Build-time check on the image size:
> > - */
> > -. = ASSERT((_end - _text <= KERNEL_IMAGE_SIZE),
> > - "kernel image bigger than KERNEL_IMAGE_SIZE");
>
> So we have this:
>
> SECTIONS
> {
> #ifdef CONFIG_X86_32
> . = LOAD_OFFSET + LOAD_PHYSICAL_ADDR;
> phys_startup_32 = ABSOLUTE(startup_32 - LOAD_OFFSET);
> #else
> . = __START_KERNEL;
> ^^^^^^^^^^
>
> which sets the location counter to
>
> #define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
>
> which is 0xffffffff80000000 + ALIGN(CONFIG_PHYSICAL_START, CONFIG_PHYSICAL_ALIGN)
>
> and that second term after the '+' has effect only when
> CONFIG_RELOCATABLE=n and that's not really used on modern kernel configs
> as RELOCATABLE is selected by EFI_STUB and RANDOMIZE_BASE depends on at
> and and ...
>
> So IOW, in a usual .config we have:
>
> __START_KERNEL_map at 0xffffffff80000000
> _text at 0xffffffff81000000
>
> So practically and for the majority of configs, the kernel image really
> does start at _text and not at __START_KERNEL_map and we map 16Mb which
> is 4 PMDs of unused pages. So basically you're correcting that here -
> that the number tested against KERNEL_IMAGE_SIZE is 16Mb more.
>
> Yes, no?
>
> Or am I missing some more important aspect and this is more than just a
> small correctness fixlet?
>
> Thx.
>

This is indeed just a small correctness fixlet, but I'm not following
the rest of your comments. PHYSICAL_START has an effect independent of
the setting of RELOCATABLE. It's where the kernel image starts in
virtual address space, as shown by the 16MiB difference between
__START_KERNEL_map and _text in the usual .config situation. In all
configs, not just majority, the kernel image itself starts at _text. The
16MiB gap below _text is not actually mapped, but the important point is
that the way the initial construction of pagetables is currently setup,
the code cannot map anything above __START_KERNEL_map + KERNEL_IMAGE_SIZE,
so _end needs to be below that.

If KASLR was disabled (either at build-time or run-time), these
link-time addresses are where the kernel actually lives (in VA space);
and if it was enabled, it will make sure to place the _end of the kernel
below KERNEL_IMAGE_SIZE when choosing a random virtual location.

That said, AFAICT, RELOCATABLE and PHYSICAL_START look like historical
artifacts at this point: RELOCATABLE should be completely irrelevant for
the 64-bit kernel, and there's really no reason to be able to configure
the start VA of the kernel, that should just be constant independent of
PHYSICAL_START.

Thanks.