Re: kexec reboot failed due to commit 75d090fd167ac

From: Kirill A. Shutemov
Date: Sat Sep 09 2023 - 07:32:18 EST


On Fri, Sep 08, 2023 at 06:17:53PM +0200, Ard Biesheuvel wrote:
> On Fri, Sep 8, 2023 at 5:58 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >
> > On Fri, Sep 08, 2023 at 03:32:33PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Sep 08, 2023 at 02:02:30PM +0800, Aaron Lu wrote:
> > > > On Thu, Sep 07, 2023 at 04:14:09PM +0300, Kirill A. Shutemov wrote:
> > > > > On Tue, Aug 29, 2023 at 10:04:51PM +0800, Aaron Lu wrote:
> > > > > > > Could you show dmesg of the first kernel before kexec?
> > > > > >
> > > > > > Attached.
> > > > > >
> > > > > > BTW, kexec is invoked like this:
> > > > > > kver=6.4.0-rc5-00009-g75d090fd167a
> > > > > > kdir=$HOME/kernels/$kver
> > > > > > sudo kexec -l $kdir/vmlinuz-$kver --initrd=$kdir/initramfs-$kver.img --append="root=UUID=4381321e-e01e-455a-9d46-5e8c4c5b2d02 ro net.ifnames=0 acpi_rsdp=0x728e8014 no_hash_pointers sched_verbose selinux=0"
> > > > >
> > > > > I don't understand why it happens.
> > > > >
> > > > > Could you check if this patch changes anything:
> > > > >
> > > > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > > > > index 94b7abcf624b..172c476ff6f3 100644
> > > > > --- a/arch/x86/boot/compressed/misc.c
> > > > > +++ b/arch/x86/boot/compressed/misc.c
> > > > > @@ -456,10 +456,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > > > >
> > > > > debug_putstr("\nDecompressing Linux... ");
> > > > >
> > > > > +#if 0
> > > > > if (init_unaccepted_memory()) {
> > > > > debug_putstr("Accepting memory... ");
> > > > > accept_memory(__pa(output), __pa(output) + needed_size);
> > > > > }
> > > > > +#endif
> > > > >
> > > > > __decompress(input_data, input_len, NULL, NULL, output, output_len,
> > > > > NULL, error);
> > > > > --
> > > >
> > > > It solved the problem.
> > >
> > > Looks like increasing BOOT_INIT_PGT_SIZE fixes the issue. I don't yet
> > > understand why and how unaccepted memory is involved. I will look more
> > > into it.
> > >
> > > Enabling CONFIG_RANDOMIZE_BASE also makes the issue go away.
> >
> > Is this perhaps just luck? I.e. does is break ever on, say, 1000 boot
> > attempts? (i.e. maybe some position is bad and KASLR happens to usually
> > avoid it?)

Yes, it can be luck.

> > > Kees, maybe you have a clue?
> >
> > The only thing I can think of is that something isn't being counted
> > correctly due to the size of code, and it just happens that this commit
> > makes the code large enough to exceed some set of mappings?
> >
> > >
> > > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> > > index 9191280d9ea3..26ccce41d781 100644
> > > --- a/arch/x86/include/asm/boot.h
> > > +++ b/arch/x86/include/asm/boot.h
> > > @@ -40,7 +40,7 @@
> > > #ifdef CONFIG_X86_64
> > > # define BOOT_STACK_SIZE 0x4000
> > >
> > > -# define BOOT_INIT_PGT_SIZE (6*4096)
> > > +# define BOOT_INIT_PGT_SIZE (7*4096)
> >
> > That's why this might be working, for example? How large is the boot
> > image before/after the commit, etc?
> >
>
> Not sure why these changes would make a difference here, but choking
> on accept_memory() on a non-TDX suggests that init_unaccepted_memory()
> is poking into unmapped memory before it even decides that the
> unaccepted memory does not exist.
>
> init_unaccepted_memory() has
>
> ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
> if (ret) {
> warn("EFI config table not found.");
> return false;
> }
>
> which looks for <guid, phys_addr> tuples in an array pointed to by the
> EFI system table, and if either of those is not mapped, things can be
> expected to explode.
>
> The only odd thing there is that this code is invoked after setting up
> the 'demand paging' logic in the decompressor.
>
> If you haven't yet, could you please retry the kexec boot with
> earlyprintk=tty<insert your UART params here>?

early console in extract_kernel
input_data: 0x000000807eb433a8
input_len: 0x0000000000d26271
output: 0x000000807b000000
output_len: 0x0000000004800c10
kernel_total_size: 0x0000000003e28000
needed_size: 0x0000000004a00000
trampoline_32bit: 0x000000000009d000

Decompressing Linux... out of pgt_buf in arch/x86/boot/compressed/ident_map_64.c!?
pages->pgt_buf_offset: 0x0000000000006000
pages->pgt_buf_size: 0x0000000000006000


Error: kernel_ident_mapping_init() failed

It crashes on #PF due to stbl->nr_tables dereference in
efi_get_conf_table() called from init_unaccepted_memory().

I don't see anything special about stbl location: 0x775d6018.

One other bit of information: disabling 5-level paging also helps the
issue.

I will debug further.

--
Kiryl Shutsemau / Kirill A. Shutemov