Re: [PATCH] PCI: starfive: Enable PCIe controller's PM runtime before probing host bridge

From: Krishna Chaitanya Chundru
Date: Fri Oct 11 2024 - 06:47:47 EST




On 10/11/2024 2:22 AM, Bjorn Helgaas wrote:
On Thu, Oct 10, 2024 at 01:29:50PM -0700, Mayank Rana wrote:
Commit 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") enables runtime PM for host bridge enforcing dependency chain
between PCIe controller, host bridge and endpoint devices. With this,
Starfive PCIe controller driver's probe enables host bridge (child device)
PM runtime before parent's PM runtime (Starfive PCIe controller device)
causing below warning and callstack:

I don't want the bisection hole that would result if we kept
02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") and applied this patch on top of it.

If this is the fix, we'll apply it *first*, followed by 02787a3b4d10
(which will obviously become a different commit), so the locking
problem below described below should never exist in -next or the
upstream tree.

So we need to audit other drivers to make sure they don't have theBjorn, I have checked all the drivers in the controller folder where
they are using pm_runtime_enable(), this is the only driver which needs
to be fixed. once this patched was taken can we take "PCI/PM: Enable
runtime power management for host bridges"

- Krishna Chaitanya.
same problem as starfive, then make a series with this patch and any
others that need similar fixes, followed by the "PCI/PM: Enable
runtime power management for host bridges" patch.

Presumably this patch fixes a problem all by itself, even without
considering 02787a3b4d10, so the commit message should describe that
problem.

pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device
with active children

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc1+ #15438 Not tainted
------------------------------------------------------
systemd-udevd/159 is trying to acquire lock:
ffffffff81822520 (console_owner){-.-.}-{0:0}, at:
console_lock_spinning_enable+0x3a/0x60

but task is already holding lock:
ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
pm_runtime_enable+0x1e/0xb6

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&dev->power.lock){-...}-{2:2}:
lock_acquire.part.0+0xa2/0x1d4
lock_acquire+0x44/0x5a
_raw_spin_lock_irqsave+0x3a/0x64
__pm_runtime_resume+0x40/0x86
__uart_start+0x40/0xb2
uart_write+0x90/0x220
n_tty_write+0x10a/0x40e
file_tty_write.constprop.0+0x10c/0x230
redirected_tty_write+0x84/0xbc
do_iter_readv_writev+0x100/0x166
vfs_writev+0xc6/0x398
do_writev+0x5c/0xca
__riscv_sys_writev+0x16/0x1e
do_trap_ecall_u+0x1b6/0x1e2
_new_vmalloc_restore_context_a0+0xc2/0xce

-> #1 (&port_lock_key){-.-.}-{2:2}:
lock_acquire.part.0+0xa2/0x1d4
lock_acquire+0x44/0x5a
_raw_spin_lock_irqsave+0x3a/0x64
serial8250_console_write+0x2a0/0x474
univ8250_console_write+0x22/0x2a
console_flush_all+0x2f6/0x3c8
console_unlock+0x80/0x1a8
vprintk_emit+0x10e/0x2e0
vprintk_default+0x16/0x1e
vprintk+0x1e/0x3c
_printk+0x36/0x50
register_console+0x292/0x418
serial_core_register_port+0x6d6/0x6dc
serial_ctrl_register_port+0xc/0x14
uart_add_one_port+0xc/0x14
serial8250_register_8250_port+0x288/0x428
dw8250_probe+0x422/0x518
platform_probe+0x4e/0x92
really_probe+0x10a/0x2da
__driver_probe_device.part.0+0xb2/0xe8
driver_probe_device+0x78/0xc4
__device_attach_driver+0x66/0xc6
bus_for_each_drv+0x5c/0xb0
__device_attach+0x84/0x13c
device_initial_probe+0xe/0x16
bus_probe_device+0x88/0x8a
deferred_probe_work_func+0xd4/0xee
process_one_work+0x1e0/0x534
worker_thread+0x166/0x2cc
kthread+0xc4/0xe0
ret_from_fork+0xe/0x18

-> #0 (console_owner){-.-.}-{0:0}:
check_noncircular+0x10e/0x122
__lock_acquire+0x105c/0x1f4a
lock_acquire.part.0+0xa2/0x1d4
lock_acquire+0x44/0x5a
console_lock_spinning_enable+0x58/0x60
console_flush_all+0x2cc/0x3c8
console_unlock+0x80/0x1a8
vprintk_emit+0x10e/0x2e0
dev_vprintk_emit+0xea/0x112
dev_printk_emit+0x2e/0x48
__dev_printk+0x40/0x5c
_dev_warn+0x46/0x60
pm_runtime_enable+0x98/0xb6
starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
platform_probe+0x4e/0x92
really_probe+0x10a/0x2da
__driver_probe_device.part.0+0xb2/0xe8
driver_probe_device+0x78/0xc4
__driver_attach+0x54/0x162
bus_for_each_dev+0x58/0xa4
driver_attach+0x1a/0x22
bus_add_driver+0xec/0x1ce
driver_register+0x3e/0xd8
__platform_driver_register+0x1c/0x24
starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
do_one_initcall+0x5e/0x28c
do_init_module+0x52/0x1ba
load_module+0x1440/0x18f0
init_module_from_file+0x76/0xae
idempotent_init_module+0x18c/0x24a
__riscv_sys_finit_module+0x52/0x82
do_trap_ecall_u+0x1b6/0x1e2
_new_vmalloc_restore_context_a0+0xc2/0xce

other info that might help us debug this:

Chain exists of:
console_owner --> &port_lock_key --> &dev->power.lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&dev->power.lock);
lock(&port_lock_key);
lock(&dev->power.lock);
lock(console_owner);

*** DEADLOCK ***

4 locks held by systemd-udevd/159:
#0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at:
__driver_attach+0x4c/0x162
#1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
pm_runtime_enable+0x1e/0xb6
#2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at:
dev_vprintk_emit+0xea/0x112
#3: ffffffff81822448 (console_srcu){....}-{0:0}, at:
console_flush_all+0x4e/0x3c8

stack backtrace:
CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #15438
Hardware name: StarFive VisionFive 2 v1.2A (DT)
Call Trace:
[<ffffffff80006a02>] dump_backtrace+0x1c/0x24
[<ffffffff80b70b3e>] show_stack+0x2c/0x38
[<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4
[<ffffffff80b7f946>] dump_stack+0x14/0x1c
[<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350
[<ffffffff8007fd76>] check_noncircular+0x10e/0x122
[<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a
[<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4
[<ffffffff800842be>] lock_acquire+0x44/0x5a
[<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60
[<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8
[<ffffffff8008c23e>] console_unlock+0x80/0x1a8
[<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0
[<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112
[<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48
[<ffffffff80b7a006>] __dev_printk+0x40/0x5c
[<ffffffff80b7a2be>] _dev_warn+0x46/0x60
[<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6
[<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
[<ffffffff806f83f6>] platform_probe+0x4e/0x92
[<ffffffff80b7a680>] really_probe+0x10a/0x2da
[<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8
[<ffffffff806f6112>] driver_probe_device+0x78/0xc4
[<ffffffff806f6278>] __driver_attach+0x54/0x162
[<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4
[<ffffffff806f5c9e>] driver_attach+0x1a/0x22
[<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce
[<ffffffff806f7112>] driver_register+0x3e/0xd8
[<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24
[<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
[<ffffffff800027ba>] do_one_initcall+0x5e/0x28c
[<ffffffff800bb4d8>] do_init_module+0x52/0x1ba
[<ffffffff800bcc8a>] load_module+0x1440/0x18f0
[<ffffffff800bd2f0>] init_module_from_file+0x76/0xae
[<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a
[<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82
[<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2
[<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce

Fix this issue by enabling starfive pcie controller device's PM runtime
status before calling into pci_host_probe() through plda_pcie_host_init().

Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Fixes: 02787a3b4d10 ("PCI/PM: Enable runtime power management for host bridges")
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Signed-off-by: Mayank Rana <quic_mrana@xxxxxxxxxxx>
---
drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c
index 0567ec373a3e..e73c1b7bc8ef 100644
--- a/drivers/pci/controller/plda/pcie-starfive.c
+++ b/drivers/pci/controller/plda/pcie-starfive.c
@@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev)
if (ret)
return ret;
+ pm_runtime_enable(&pdev->dev);
+ pm_runtime_get_sync(&pdev->dev);
+
plda->host_ops = &sf_host_ops;
plda->num_events = PLDA_MAX_EVENT_NUM;
/* mask doorbell event */
@@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev)
plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS;
ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops,
&stf_pcie_event);
- if (ret)
+ if (ret) {
+ pm_runtime_put_sync(&pdev->dev);
+ pm_runtime_disable(&pdev->dev);
return ret;
+ }
- pm_runtime_enable(&pdev->dev);
- pm_runtime_get_sync(&pdev->dev);
platform_set_drvdata(pdev, pcie);
return 0;
--
2.25.1