Re: [lkp] [ACPI] 7494b07eba: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

From: Rafael J. Wysocki
Date: Thu Oct 08 2015 - 16:09:25 EST


On Thursday, October 08, 2015 10:36:40 AM Al Stone wrote:
> On 10/08/2015 05:44 AM, Hanjun Guo wrote:
> > On 10/08/2015 11:21 AM, kernel test robot wrote:
> >> FYI, we noticed the below changes on
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >> commit 7494b07ebaae2117629024369365f7be7adc16c3 ("ACPI: add in a
> >> bad_madt_entry() function to eventually replace the macro")
> >>
> >> [ 0.000000] ACPI: undefined MADT subtable type for FADT 4.0: 127 (length 12)
> >
> > Seems that the MADT table contains reserved subtable type (0x7F),
> > so this is traded as a wrong type in our patch.
> >
> >> [ 0.000000] ACPI: Error parsing LAPIC address override entry
> >
> > This was called by early_acpi_parse_madt_lapic_addr_ovr() in
> > arch/x86/kernel/acpi/boot.c, which is scanning MADT for the first
> > time when booting, so it will fail the boot process when finding
> > the reserved MADT subtable type.
> >
> >> [ 0.000000] ACPI: Invalid BIOS MADT, disabling ACPI
> >
> > As the spec said in Table 5-46 (ACPI 6.0):
> >
> > 0x10-0x7F Reserved. OSPM skips structures of the reserved type.
> >
> > Should we just ignore those reserved type when scanning the MADT
> > table? In the patch "ACPI: add in a bad_madt_entry() function to
> > eventually replace the macro", we just trade it as wrong, that's
> > why we failed to boot the system.
> >
> > Thanks
> > Hanjun
>
> Arrgh. This is why people get frustrated with ACPI. The spec is
> saying that those sub-table types are reserved -- implying they can
> and probably will be used for something else in the future -- but
> then vendors are shipping firmware that uses those reserved values,
> and an OS *expects* them to be used, and there is *no* documentation
> of it other than a kernel workaround.
>
> So yet again, technically this MADT subtable *is* wrong, and someone
> should slap the vendor for doing this. But, the practical side of
> this is that we now have to workaround what is now a known violation
> of the spec.
>
> The more ACPI allows this kind of nonsense, the less usable it will
> become.

Linux Kernel Developer's First Rule: You shall not break setups that
worked previously, even if they worked by accident.

IOW, if something booted and your commit made it not boot any more, it counts
as a regression and needs to be modified or reverted.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/