Re: [PATCH v2] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

From: Dave Young
Date: Wed Apr 03 2019 - 23:22:49 EST


On 04/04/19 at 11:10am, Baoquan He wrote:
> On 04/04/19 at 11:00am, Baoquan He wrote:
> > On 04/04/19 at 10:52am, Dave Young wrote:
> > > On 04/04/19 at 01:23am, Junichi Nomura wrote:
> > > > Hi Dave and Chao,
> > > >
> > > > On 4/3/19 6:02 PM, Chao Fan wrote:
> > > > > On Wed, Apr 03, 2019 at 04:23:06PM +0800, Chao Fan wrote:
> > > > >> On Wed, Apr 03, 2019 at 04:09:16PM +0800, Dave Young wrote:
> > > > >>> Fix 3. need more debugging, have you or Junichi run tests on more real
> > > > >>> hardware, maybe it is easier to reproduce on real hardware, I'm glad to
> > > > >>> help to try test patch or provide any help.
> > > > >>
> > > > >> I am still testing in real hardware.
> > > > >
> > > > > Hi Dave,
> > > > >
> > > > > I find a Fujitsu Desktop PC to test it.
> > > > > Without this PATCH, it failed to kexec and kdump.
> > > > > With this PATCH, it succeed to kexec.
> > > > > But failed to kdump. From the log, I think it didn't jump to the second
> > > > > kernel, just reboot after panic. I have not figured out what's the
> > > > > problem, but it seems not caused by this PATCH.
> > > > > So I still think this PATCH works for the Fujitsu Desktop PC.
> > > > >
> > > > > As for your issue, I think there may be some problems related to specified
> > > > > hardware. Are you using a Lenovo laptop?
> > > > >
> > > > > And I am not sure how Nomura tested it.
> > > >
> > > > I've tested 3 different models of EFI-booted baremetal servers with both
> > > > normal kexec and panic kexec. So far as I've tried Linus's v5.1-rc3,
> > > > the problem always reproduced without the patch and disappears with the patch.
> > >
> > > Hmm, both of my two laptops (Thinkpad T480s and T420) failed to boot with kexec.
> > >
> > > I will see if I can find something, but it may need more time because
> > > early console does not work especially after kexec.
> >
> > Dave, can you try below patch to print debugging message and hang kernel
> > to check the outputting? The hang is necessary, otherwise later printk
> > printking will overwrite it.
> >
> > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> > index c0d6c560df69..68119547c4aa 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -351,9 +351,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > /* Clear flags intended for solely in-kernel use. */
> > boot_params->hdr.loadflags &= ~KASLR_FLAG;
> >
> > - /* Save RSDP address for later use. */
> > - boot_params->acpi_rsdp_addr = get_rsdp_addr();
> > -
> > sanitize_boot_params(boot_params);
> >
> > if (boot_params->screen_info.orig_video_mode == 7) {
> > @@ -370,6 +367,10 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
> > console_init();
> > debug_putstr("early console in extract_kernel\n");
> >
> + /* Save RSDP address for later use. */
> + boot_params->acpi_rsdp_addr = get_rsdp_addr();
> +
> + error("Hang kernel for kexec debugging");
>
> Sorry, here I means calling error() to hang kernel after calling
> get_rsdp_addr().

Thanks, it did not hang, it always reset to firmware/grub boot menu.
I'm pretty sure now the bug exists in get_rsdp_addr().