Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels
From: Dave Young
Date: Wed Apr 17 2019 - 03:08:30 EST
On 04/17/19 at 02:00pm, Kairui Song wrote:
> On Wed, Apr 17, 2019 at 12:57 PM Dave Young <dyoung@xxxxxxxxxx> wrote:
> >
> > On 04/17/19 at 09:38am, Dave Young wrote:
> > > On 04/16/19 at 03:22pm, Borislav Petkov wrote:
> > > > On Tue, Apr 16, 2019 at 07:41:33PM +0800, Dave Young wrote:
> > > > > On 04/16/19 at 11:52am, Borislav Petkov wrote:
> > > > > > I'll queue the below in the next days if there are no more complaints:
> > > > >
> > > > > As for the kexec breakage, even with the V3 patch, kexec still hangs on
> > > > > a Lenovo T420 laptop. Kairui also reproduced the problem. So can we
> > > > > wait a few days see if we can make some progress to find the cause?
> > > >
> > > > How is applying this patch going to change anything?
> > > >
> > > > I was told that the breakage is there even without it...
> > >
> > > Without this patch, the bug happens in the efi_get_rsdp.. function, this
> > > patch tries to fix that by adding kexec_get.. but the new introduced
> > > kexec_* function does not work on some laptops, so it is not a 100% good
> > > fix, I hoped we can get it working for all known issues. But if we can
> > > not do it eg. within one week we can go with this version and leave the
> > > laptop issue as a known issue.
> > >
> >
> > Latest debugging status:
> >
> > Kexec boot works with commenting out some code like below, so the guid
> > cmp (memcmp) caused a system reset), still need to find out why:
> >
> > diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> > index d9f9abd63c68..13e7a23ae94c 100644
> > --- a/arch/x86/boot/compressed/acpi.c
> > +++ b/arch/x86/boot/compressed/acpi.c
> > @@ -95,10 +95,12 @@ __efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
> > table = tbl->table;
> > }
> >
> > +/*
> > if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> > rsdp_addr = table;
> > else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> > return table;
> > +*/
> > }
> >
> > return rsdp_addr;
> > @@ -291,9 +293,10 @@ acpi_physical_address get_rsdp_addr(void)
> > if (!pa)
> > pa = kexec_get_rsdp_addr();
> >
> > +/*
> > if (!pa)
> > pa = efi_get_rsdp_addr();
> > -
> > +*/
> > if (!pa)
> > pa = bios_get_rsdp_addr();
> >
> >
>
> Hi Dave, for this case I think it's just because GCC will found the
> loop does nothing, and optimize out the whole loop in
> __efi_get_rsdp_addr and will no longer read the actual nr_table value.
>
> I can fix the boot error on T420 with your patch, but if I add
> anything, like a hardcode value assignment with the right value for
> acpi_rsdp in the loop, it will reset the machine. But set acpi_rsdp
> with a right initial value out side the loop works fine.
> If the loop condition is false, then there should be no difference
> between just comment out the line you mentioned and add an assignment.
> Else it just assign the value multiple times, not very reasonable but
> shouldn't fail.
>
> And, I inspected the generated ASM code also suggest the same thing.
> So still, access the systab memory is the cause of the system reset on
> certain machines.
Makse sense, my previous debug also point to some systab accessing.
Probably some early pg table mess up.
Thanks
Dave