Re: [PATCH] hack to debug acpiphp crash

From: Igor Mammedov
Date: Tue Jul 25 2023 - 04:07:37 EST


On Mon, 24 Jul 2023 21:52:34 -0400
Woody Suwalski <terraluna977@xxxxxxxxx> wrote:

> Igor Mammedov wrote:
> > Woody thanks for testing,
> >
> > can you try following patch which will try to workaround NULL bus->self if it's
> > a really cuplrit and print an extra debug information.
> > Add following to kernel command line(make sure that CONFIG_DYNAMIC_DEBUG is enabled):
> >
> > dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c +p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file drivers/pci/setup-bus.c +p" ignore_loglevel
> >
> > What I find odd in you logs is that enable_slot() is called while native PCIe
> > should be used. Additional info might help to understand what's going on:
> > 1: 'lspci' output
> > 2: DSDT and all SSDT ACPI tables (you can use 'acpidump -b' to get them).
> >
> > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>
[...]
> >
> > /**
> Unfortunately the patch above does not seem to prevent the kernel crash.
> Here comes the requested diagnostic info: dmesg's before and after,
> choice of lspci's and acpi tables. Hope that will help :-)

Looking at dmesg-6.5-debug_after.txt
there aren't "BUG: kernel NULL pointer dereference" line anymore
The call traces you see are induced by WARN(), which purpose is
to show call path that calls enable_slot().

Let me split potential fix from debug and repost that as separate
patches for you to try.
I'd like to see debug output without 'fix' to track down which
root port/device causes NULL pointer dereference. And hopefully
in a few roundtrips figure out why old code doesn't crash.

PS:
What happens is that on resume firmware (likely EC),
issues ACPI bus check on root ports which (bus check) is
wired to acpiphp module (though pciehp module was initialized
at boot to manage root ports), it's likely firmware bug.

I'd guess the intent behind this was to check if PCIe devices
were hotplugged while laptop has been asleep, and for
some reason they didn't use native PCIe hotplug to handle that.
However looking at laptop specs you can't hotplug PCIe
devices via external ports. Given how old laptop is
it isn't going to be fixed, so we would need a workaround
or fixup DSDT to skip buscheck.

The options I see is to keep old kernel as for such case,
or bail out early from bus check/enable_slot since root port
is managed by pciehp module (and let it handle hotplug).

> Thanks, Woody
>
>