Re: [PATCH 1/1] PCI: acpiphp:: use pci_assign_unassigned_bridge_resources() only if bus->self not NULL

From: Bjorn Helgaas
Date: Mon Aug 07 2023 - 13:29:04 EST


On Mon, Aug 07, 2023 at 03:07:46PM +0200, Igor Mammedov wrote:
> On Fri, 4 Aug 2023 18:27:09 -0500
> Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote:
> > > Commit [1] switched acpiphp hotplug to use
> > > pci_assign_unassigned_bridge_resources()
> > > which depends on bridge being available, however in some cases
> > > when acpiphp is in use, enable_slot() can get a slot without
> > > bridge associated.
> >
> > acpiphp is *always* in use if we get to enable_slot(), so that doesn't
> > really add information here.
> >
> > > 1. legitimate case of hotplug on root bus
> > > (likely not exiting on real hw, but widely used in virt world)
> > > 2. broken firmware, that sends 'Bus check' events to non
> > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow
> > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus
> > > without bridge assigned to it.
>
> how about following commit message (incorporating all feed back in this thread):

I incorporated this commit log and put the patch on for-linus for
v6.5. I think the patch is fine, and we can amend the commit log
again if necessary.

> -- cut --
> Commit [1] switched acpiphp hotplug to use
> pci_assign_unassigned_bridge_resources()
> which depends on bridge being available, however in case
> when acpiphp is enabled [2], enable_slot() can be called without
> bridge associated.
> 1. legitimate case of hotplug on root bus
> (widely used in virt world)
> 2. a (misbehaving) firmware, that sends 'Bus check' events
> to non existing root ports (Dell Inspiron 7352/0W6WV0),
> which endup at acpiphp:enable_slot(..., bridge = 0)
> where bus has not bridge assigned to it.
> acpihp doesn't know that it's a bridge, and bus specific
> 'PCI subsystem' can't argument ACPI context with bridge
> information since the PCI device to get this data from
> is/was not available.
>
> Issue is easy to reproduce with QEMU's 'pc' machine, which supports
> PCI hotplug on hostbridge slots. To reproduce boot kernel at
> commit [1] in VM started with following CLI (assuming guest root fs
> is installed on sda1 partition):
>
> # qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \
> -monitor stdio -serial file:serial.log \
> -kernel arch/x86/boot/bzImage \
> -append "root=/dev/sda1 console=ttyS0" \
> guest_disk.img
>
> once guest OS is fully booted at qemu prompt:
>
> (qemu) device_add e1000
>
> (check serial.log) it will cause NULL pointer dereference at:
>
> void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> {
> struct pci_bus *parent = bridge->subordinate;
>
> [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [...]
> [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260
> [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0
> [ 612.277809] enable_slot+0x21f/0x3e0
> [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260
> [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10
> [ 612.277827] acpi_device_hotplug+0xbc/0x540
> [ 612.277834] acpi_hotplug_work_fn+0x15/0x20
> [ 612.277839] process_one_work+0x1f7/0x370
> [ 612.277845] worker_thread+0x45/0x3b0
> [ 612.277850] ? __pfx_worker_thread+0x10/0x10
> [ 612.277854] kthread+0xdc/0x110
> [ 612.277860] ? __pfx_kthread+0x10/0x10
> [ 612.277866] ret_from_fork+0x28/0x40
> [ 612.277871] ? __pfx_kthread+0x10/0x10
> [ 612.277876] ret_from_fork_asm+0x1b/0x30
>
> The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with
> following sequence:
> 1. suspend to RAM
> 2. wake up with the same backtrace being observed:
> 3. 2nd suspend to RAM attempt makes laptop freeze
>
> Fix it by using __pci_bus_assign_resources() instead of
> pci_assign_unassigned_bridge_resources() as we used to do
> but only in case when bus doesn't have a bridge associated
> (to cover for the case of ACPI event on hostbridge or
> non existing root port).
>
> That let us keep hotplug on root bus working like it used to be
> and at the same time keeps resource reassignment usable on
> root ports (and other 1st level bridges) that was fixed by [1].
>
> 1)
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> 2) CONFIG_HOTPLUG_PCI_ACPI=y
>
> -- cut --
>
> if commit message is looking acceptable to you, I can respin
> amended patch with your suggestions taken in account.
>
> > IIUC, the Inspiron problem happens when:
> >
> > - acpiphp_context->bridge is NULL, so hotplug_event() calls
> > enable_slot() instead of acpiphp_check_bridge(), AND
> >
> > - acpiphp_slot->bus->self is also NULL, because enable_slot() calls
> > pci_assign_unassigned_bridge_resources() with that NULL pointer,
> > which dereferences "bridge->subordinate"
> >
> > But I can't figure out why acpiphp_context->bridge is NULL for RP07
> > and RP08 (which don't exist), but not for RP03 (which does).
> >
> > I guess all the acpiphp_contexts (RP03, RP07, RP08) must be allocated in
> > acpiphp_add_context() by acpiphp_init_context().
> >
> > Woody's lspci from [1] shows only one Root Port:
> >
> > 00:1c.0 Wildcat Point-LP PCI Express Root Port #3
> >
> > The DSDT.DSL includes:
> >
> > Device (RP01) _ADR 0x001C0000 # 1c.0
> > Device (RP02) _ADR 0x001C0001 # 1c.1
> > Device (RP03) _ADR 0x001C0002 # 1c.2
> > Device (RP04) _ADR 0x001C0003 # 1c.3
> > Device (RP05) _ADR 0x001C0004 # 1c.4
> > Device (RP06) _ADR 0x001C0005 # 1c.5
> > Device (RP07) _ADR 0x001C0006 # 1c.6
> > Device (RP08) _ADR 0x001C0007 # 1c.7
> >
> > I can see why we might need a Bus Check after resume to see if
> > something got added while we were suspended. But I don't see why we
> > handle RP03 differently from RP07 and RP08.
> >
> > Can you help me out? I'm lost in a maze of twisty passages, all
> > alike.
>
> I'll try to trace call path and see where it leads
> (based on a guess in updated commit message: OS/nor ACPI
> has info if it's bridge when the device didn't exists
> during boot).
>
> (though, I don't think it should hold this patch.
> while it would be good to understand where bridge
> gets added in acpi context, it's not directly relevant
> to the fixing hotplug on hostbridge and buscheck events
> on non-existing root-ports)
>
> > Bjorn
> >
> > [1] https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@xxxxxxxxx
> >
>
>