Re: [PATCH 1/1] PCI: acpiphp:: use pci_assign_unassigned_bridge_resources() only if bus->self not NULL

From: Igor Mammedov
Date: Mon Jul 31 2023 - 08:45:17 EST


On Sat, 29 Jul 2023 16:50:09 -0500
Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> On Fri, Jul 28, 2023 at 11:32:16AM +0200, Igor Mammedov wrote:
> > On Thu, 27 Jul 2023 12:41:02 -0500 Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote:
> > > > Commit [1] switched acpiphp hotplug to use
> > > > pci_assign_unassigned_bridge_resources()
> > > > which depends on bridge being available, however in some cases
> > > > when acpiphp is in use, enable_slot() can get a slot without
> > > > bridge associated.
> > > > 1. legitimate case of hotplug on root bus
> > > > (likely not exiting on real hw, but widely used in virt world)
> > > > 2. broken firmware, that sends 'Bus check' events to non
> > > > existing root ports (Dell Inspiron 7352/0W6WV0), which somehow
> > > > endup at acpiphp:enable_slot(..., bridge = 0) and with bus
> > > > without bridge assigned to it.
> > >
> > > Do we have evidence about the details of this non-existent root port?
> > > If we do, I think it would be interesting to include a URL to them in
> > > case there's some hole in the way we handle Bus Check events.
> >
> > it's scattered over logs Woody has provided, here are links to
> > emails with
> > 1: lspci output
> > https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@xxxxxxxxx/
> >
> > according to lscpi and dmesg there is only one root-port at 1c.0
> > which is occupied by wifi card
> >
> > while DSTD table has more ports described, which is fine as long as
> > missing/disabled are not reported as present.
> >
> > 2: last round of logs with debug patch /before 40613da5, with 40613da5, and after/
> > https://lore.kernel.org/r/46437825-3bd0-2f8a-12d8-98a2b54d7c22@xxxxxxxxx/
> >
> > here dmesg shows 1st correct port
> > ACPI: \_SB_.PCI0.RP03: acpiphp_glue: Bus check in hotplug_event(): bridge: 000000000dad0b34
> > and then later on
> > ACPI: \_SB_.PCI0.RP07: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000
> > ACPI: \_SB_.PCI0.RP08: acpiphp_glue: Bus check in hotplug_event(): bridge: 0000000000000000
> > which aren't recognized as bridge
>
> Thanks, that does seem a little suspect. ACPI r6.5 sec 5.6.6 says
> that when OSPM handles a Bus Check, it should "perform a Plug and Play
> re-enumeration operation on the device tree starting from the point
> where it has been notified."
>
> PCI devices are enumerated by doing PCI config reads. It would make
> sense to re-enumerate a PCI hierarchy starting with a PCI device
> that's already known to the OS, e.g., by scanning the secondary bus of
> a PCI-to-PCI bridge.
>
> I think there are two problems here:
>
> 1) The platform shouldn't send a Bus Check notification to a PCI
> device that doesn't exist. How could the OS re-enumerate
> starting there?

in case of reported laptop, DSDT provides Device Descriptors
for not existing root-ports.

OSPM can't do anything with it but to pass Notify event to
PCI bus-specific enumeration mechanism, and it's upto PCI subsystem
to discard/ignore Notify() on this ACPI node.

I think I can mock this case by hacking QEMU, that should help
with finding a proper place to fix it.

(here it is:
https://gitlab.com/imammedo/qemu/-/commits/acpiphp_buscheck_on_missing_device?ref_type=heads
I'll post hack patch as a reply to this email for posterity)

> 2) Linux runs acpiphp_hotplug_notify() for Bus Checks to
> non-existent PCI devices when it ignore them; reasoning below.
>
> We call acpiphp_enumerate_slots() in this path, which happens before
> any of the PCI devices on the root bus have been enumerated:
>
> pci_register_host_bridge
> pcibios_add_bus(root bus)
> acpi_pci_add_bus
> acpiphp_enumerate_slots(pci_bus *bus)
> acpi_walk_namespace(acpiphp_add_context)
> acpiphp_add_context(struct acpiphp_bridge *)
> acpi_evaluate_integer("_ADR")
> acpiphp_init_context
> context->hp.notify = acpiphp_hotplug_notify
>
> So now we've already looked at RP03, RP07, and RP08, and set up the
> .notify() handler for all of them. Since we haven't scanned the bus
> yet, we don't know that RP03 exists and RP07 and RP08 do not.

While ACPI doesn't forbid firmware to describe non-existing RP,
the PCIe hostbridge can't hotplug extra root ports. (and QEMU
follows PCIe design in this respect on 'q35' machine).

However when it comes to hotplug on QEMU's 'pc' machine
(hotplug on root bus), each slot has "Augmented Device
Descriptors", that includes un-populated slots leading to
the presence of .notify() handler on such slots.

Then later on when device is hotplugged, a Notify(,1/*DeviceCheck*/)
is sent to previously empty slot and from there on PCI subsystem
re-enumerates either a single device or a bridge hierarchy
(from the parent context).

So I'd assume that we need to have .notify() handler for all slots
that are described in DSDT (present and non present).

> Per ACPI r6.5, sec 6, all these Device objects are "Augmented Device
> Descriptors":
>
> An Agumented [sic] Device Descriptor, which contains additional
> device information that is not provided from the Device itself, yet
> is needed by the Device or Bus driver in order to properly configure
> and use the device. This type of device is enumerated by a
> bus-specific enumeration mechanism, and OSPM uses the Address (_ADR)
> to match the ACPI Device object in the Namespace to the device
> discovered through bus enumeration.
>
> I think that means OSPM should discover a PCI device using the PCI
> bus-specific enumeration mechanism (i.e., config reads) before it even
> looks for a corresponding ACPI Device object, and it should only set
> up .notify() for PCI devices that actually exist, so the Bus Checks on
> RP07 and RP08 would be ignored and we wouldn't even get into the path
> that causes the NULL pointer dereference:
>
> acpi_device_hotplug
> acpiphp_hotplug_notify # from hp.notify
> hotplug_event
> bridge = context->bridge
> case BUS_CHECK:
> if (bridge)
> acpiphp_check_bridge
> else if (!SLOT_IS_GOING_AWAY)
> enable_slot
> bus = slot->bus # "bus" is a root bus
> pci_assign_unassigned_bridge_resources(bus->self)
> bridge = bus->self # "bridge" is NULL since
> # bus->self is NULL for root buses
> struct pci_bus *parent = bridge->subordinate
> # NULL pointer dereference
>
> Obviously none of this helps solve the current regression. Changing
> the .notify() setup would be a big change, it would be risky because
> it might affect dock support, and it still wouldn't fix your case 1 of
> hotplug on the root bus in a virtualized environment.
>
> > > > Issue is easy to reproduce with QEMU's 'pc' machine provides
> > > > PCI hotplug on hostbridge slots. to reproduce boot kernel at
> > > > commit [1] in VM started with followin CLI and hotplug a device:
> > >
> > > You mention CLI; did you mean to include a qemu command line here?
> > > Maybe it's the same thing mentioned in the 40613da52b13 commit log?
> > > I tried briefly to reproduce this using the 40613da52b13 command line
> > > but haven't quite got it going yet. I think it would be very useful
> > > to either include it here again or point to the 40613da52b13 commit
> > > log.
> >
> > my bad, I didn't realize that saying 'pc' machine is not sufficient.
> >
> > minimal CLI can be (important part '-M pc -monitor stdio',
> > the rest is for making guest boot and run at tolerable speed):
> >
> > $QEMU -M pc -m 4G -monitor stdio -cpu host --enable-kvm vm_disk_image
> >
> > Will you amend commit message or shall I repost with changes/Acks?
>
> I'll give it a shot and post it for your comments.
>
> > > > once guest OS is fully booted at qemu prompt:
> > > >
> > > > (qemu) device_add e1000
> > > >
> > > > it will cause NULL pointer dereference at
> > > >
> > > > void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
> > > > {
> > > > struct pci_bus *parent = bridge->subordinate;
>
> This worked for me (after setting CONFIG_HOTPLUG_PCI_ACPI=y :)):
>
> $ qemu-system-x86_64 -M pc -m 512M -monitor stdio -cpu host --enable-kvm -kernel arch/x86/boot/bzImage -drive format=raw,file=ubuntu.img -append "root=/dev/sda1"
> (qemu) device_add e1000
>
> (For posterity, replacing "-monitor stdio" with "-nographic -monitor
> telnet:localhost:7001,server,nowait,nodelay" and adding
> "console=ttyS0,115200n8" to the -append made it easier to see the
> crash details.)

I've not put extra arguments, because there is a lot of ways
one can configure/use monitor/serial options.

But specifying full command line like yours will be useful
for anyone who doesn't have any experience with QEMU CLI.

> > > > [ 612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018
> > > > [...]
> > > > [ 612.277798] ? pci_assign_unassigned_bridge_resources+0x1f/0x260
> > > > [ 612.277804] ? pcibios_allocate_dev_resources+0x3c/0x2a0
> > > > [ 612.277809] enable_slot+0x21f/0x3e0
> > > > [ 612.277816] acpiphp_hotplug_notify+0x13d/0x260
> > > > [ 612.277822] ? __pfx_acpiphp_hotplug_notify+0x10/0x10
> > > > [ 612.277827] acpi_device_hotplug+0xbc/0x540
> > > > [ 612.277834] acpi_hotplug_work_fn+0x15/0x20
> > > > [ 612.277839] process_one_work+0x1f7/0x370
> > > > [ 612.277845] worker_thread+0x45/0x3b0
> > > > [ 612.277850] ? __pfx_worker_thread+0x10/0x10
> > > > [ 612.277854] kthread+0xdc/0x110
> > > > [ 612.277860] ? __pfx_kthread+0x10/0x10
> > > > [ 612.277866] ret_from_fork+0x28/0x40
> > > > [ 612.277871] ? __pfx_kthread+0x10/0x10
> > > > [ 612.277876] ret_from_fork_asm+0x1b/0x30
> > > >
> > > > The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with
> > > > following sequence:
> > > > 1. suspend to RAM
> > > > 2. wake up with the same backtrace being observed:
> > > > 3. 2nd suspend to RAM attempt makes laptop freeze
> > > >
> > > > Fix it by using __pci_bus_assign_resources() instead of
> > > > pci_assign_unassigned_bridge_resources()as we used to do
> > > > but only in case when bus doesn't have a bridge associated
> > > > with it.
> > > >
> > > > That let us keep hotplug on root bus working like it used to be
> > > > but at the same time keeps resource reassignment usable on
> > > > root ports (and other 1st level bridges) that was fixed by [1].
> > > >
> > > > 1)
> > > > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > > > Link: https://lore.kernel.org/r/11fc981c-af49-ce64-6b43-3e282728bd1a@xxxxxxxxx
> > > > Reported-by: Woody Suwalski <terraluna977@xxxxxxxxx>
> > > > Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx>
> > > > ---
> > > > drivers/pci/hotplug/acpiphp_glue.c | 8 +++++++-
> > > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > > > index 328d1e416014..3bc4e1f3efee 100644
> > > > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > > > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > > > @@ -498,6 +498,7 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
> > > > acpiphp_native_scan_bridge(dev);
> > > > }
> > > > } else {
> > > > + LIST_HEAD(add_list);
> > > > int max, pass;
> > > >
> > > > acpiphp_rescan_slot(slot);
> > > > @@ -511,10 +512,15 @@ static void enable_slot(struct acpiphp_slot *slot, bool bridge)
> > > > if (pass && dev->subordinate) {
> > > > check_hotplug_bridge(slot, dev);
> > > > pcibios_resource_survey_bus(dev->subordinate);
> > > > + if (!bus->self)
> > > > + __pci_bus_size_bridges(dev->subordinate, &add_list);
> > > > }
> > > > }
> > > > }
> > > > - pci_assign_unassigned_bridge_resources(bus->self);
> > > > + if (bus->self)
> > > > + pci_assign_unassigned_bridge_resources(bus->self);
> > > > + else
> > > > + __pci_bus_assign_resources(bus, &add_list, NULL);
>
> I really wish we didn't have such different resource assignment paths
> depending on whether the device is on a root bus or deeper in the
> hierarchy. But we can't fix that now, so this seems like the right
> thing.

I've looked at possibility of making
pci_assign_unassigned_bridge_resources()
work without bridge pointer, but it looks not viable as it's
a bridge dedicated function which on top of rearranging
resources, also disables/reprograms/enables bridge.

If there are ideas how to make it better,
I can pick it up and try to implement.

Testing shows that pci_assign_unassigned_bridge_resources()
isn't ideal since it releases all resources before reassigning
and then if the later fails bridge stays in mis-configured
state (attempt to recover results in failing BAR assignment
to children devices).
It's not issue in case of
root-port -> 1 child device hotplug
since root port hadn't any working device[s] behind it.
But in case of hotplug into PCI bridge, that leaves
pre-existing devices behind the bridge broken (SHPC and acpiphp case).

> But would you be OK with this minor mod?
>
> if (pci_is_root_bus(bus))
> __pci_bus_size_bridges(dev->subordinate, &add_list);
>
> ...
>
> if (pci_is_root_bus(bus))
> __pci_bus_assign_resources(bus, &add_list, NULL);
> else
> pci_assign_unassigned_bridge_resources(bus->self);
>
> For two reasons: (1) test the same condition both places, and (2) be a
> little more explicit about the scenario (and "bus->self == NULL" also
> happens for the virtual buses added for SR-IOV).

works for me, tested with:
1: hotplug on root bus with QEMU's 'pc' machine,
2: (q35 machine) igb emulation, and adding VFs once guest is booted
3: BusCheck on missing PCI device (aka simulated env of reporter)
4: (q35 machine) resource re-allocation still works on root ports when
a device with large BARs is hotplugged

> > > > }
> > > >
> > > > acpiphp_sanitize_bus(bus);
> > > > --
> > > > 2.39.3
> > > >
> > >
> >
>