Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"

From: Johan Hovold
Date: Mon Jan 08 2024 - 03:39:21 EST


Hi Bjorn,

On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
>
> This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
>
> Michael reported that when attempting to resume from suspend to RAM on ASUS
> mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> with no output, followed by a reboot.
>
> Workarounds include:
>
> - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> - Booting with "pcie_aspm=off"
> - Booting with "pcie_aspm.policy=performance"
> - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> before suspending
> - Connecting a USB flash drive
>
> Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> Reported-by: Michael Schaller <michael@xxxxxxxxxxx>
> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@xxxxxxxxxxx
> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---

> +/* @pdev: the root port or switch downstream port */
> +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> +{
> + struct pcie_link_state *link = pdev->link_state;
> +
> + if (aspm_disabled || !link)
> + return;
> + /*
> + * Devices changed PM state, we should recheck if latency
> + * meets all functions' requirement
> + */
> + down_read(&pci_bus_sem);
> + mutex_lock(&aspm_lock);
> + pcie_update_aspm_capable(link->root);
> + pcie_config_aspm_path(link);
> + mutex_unlock(&aspm_lock);
> + up_read(&pci_bus_sem);
> +}

This function is now restored in 6.7 final and is called in paths which
already hold the pci_bus_sem as reported by lockdep (see splat below).

This can potentially lead to a deadlock and specifically prevents using
lockdep on Qualcomm platforms.

Not sure if you want to propagate whether the bus semaphore is held to
pcie_aspm_pm_state_change() or if there was some alternative to
restoring this function which should be explored instead.

Johan


============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
pcieport 0002:00:00.0: PME: Signaling with IRQ 197

but task is already holding lock:
ffffacfa78ced000
pcieport 0002:00:00.0: AER: enabled with IRQ 197
(pci_bus_sem
nvme nvme0: pci function 0002:01:00.0
){++++}-{3:3}
nvme 0002:01:00.0: enabling device (0000 -> 0002)
, at: pci_walk_bus+0x34/0xbc

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);

*** DEADLOCK ***

May be due to missing lock nesting notation

4 locks held by kworker/u16:5/90:
#0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
#1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
#2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
#3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc

stack backtrace:
CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
Workqueue: events_unbound async_run_entry_fn
Call trace:
dump_backtrace+0x9c/0x11c
show_stack+0x18/0x24
dump_stack_lvl+0x60/0xac
dump_stack+0x18/0x24
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]