Re: [PATCH] PCI: starfive: Enable PCIe controller's PM runtime before probing host bridge

From: Bjorn Helgaas
Date: Thu Oct 10 2024 - 16:52:50 EST


On Thu, Oct 10, 2024 at 01:29:50PM -0700, Mayank Rana wrote:
> Commit 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
> bridges") enables runtime PM for host bridge enforcing dependency chain
> between PCIe controller, host bridge and endpoint devices. With this,
> Starfive PCIe controller driver's probe enables host bridge (child device)
> PM runtime before parent's PM runtime (Starfive PCIe controller device)
> causing below warning and callstack:

I don't want the bisection hole that would result if we kept
02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") and applied this patch on top of it.

If this is the fix, we'll apply it *first*, followed by 02787a3b4d10
(which will obviously become a different commit), so the locking
problem below described below should never exist in -next or the
upstream tree.

So we need to audit other drivers to make sure they don't have the
same problem as starfive, then make a series with this patch and any
others that need similar fixes, followed by the "PCI/PM: Enable
runtime power management for host bridges" patch.

Presumably this patch fixes a problem all by itself, even without
considering 02787a3b4d10, so the commit message should describe that
problem.

> pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device
> with active children
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.12.0-rc1+ #15438 Not tainted
> ------------------------------------------------------
> systemd-udevd/159 is trying to acquire lock:
> ffffffff81822520 (console_owner){-.-.}-{0:0}, at:
> console_lock_spinning_enable+0x3a/0x60
>
> but task is already holding lock:
> ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
> pm_runtime_enable+0x1e/0xb6
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&dev->power.lock){-...}-{2:2}:
> lock_acquire.part.0+0xa2/0x1d4
> lock_acquire+0x44/0x5a
> _raw_spin_lock_irqsave+0x3a/0x64
> __pm_runtime_resume+0x40/0x86
> __uart_start+0x40/0xb2
> uart_write+0x90/0x220
> n_tty_write+0x10a/0x40e
> file_tty_write.constprop.0+0x10c/0x230
> redirected_tty_write+0x84/0xbc
> do_iter_readv_writev+0x100/0x166
> vfs_writev+0xc6/0x398
> do_writev+0x5c/0xca
> __riscv_sys_writev+0x16/0x1e
> do_trap_ecall_u+0x1b6/0x1e2
> _new_vmalloc_restore_context_a0+0xc2/0xce
>
> -> #1 (&port_lock_key){-.-.}-{2:2}:
> lock_acquire.part.0+0xa2/0x1d4
> lock_acquire+0x44/0x5a
> _raw_spin_lock_irqsave+0x3a/0x64
> serial8250_console_write+0x2a0/0x474
> univ8250_console_write+0x22/0x2a
> console_flush_all+0x2f6/0x3c8
> console_unlock+0x80/0x1a8
> vprintk_emit+0x10e/0x2e0
> vprintk_default+0x16/0x1e
> vprintk+0x1e/0x3c
> _printk+0x36/0x50
> register_console+0x292/0x418
> serial_core_register_port+0x6d6/0x6dc
> serial_ctrl_register_port+0xc/0x14
> uart_add_one_port+0xc/0x14
> serial8250_register_8250_port+0x288/0x428
> dw8250_probe+0x422/0x518
> platform_probe+0x4e/0x92
> really_probe+0x10a/0x2da
> __driver_probe_device.part.0+0xb2/0xe8
> driver_probe_device+0x78/0xc4
> __device_attach_driver+0x66/0xc6
> bus_for_each_drv+0x5c/0xb0
> __device_attach+0x84/0x13c
> device_initial_probe+0xe/0x16
> bus_probe_device+0x88/0x8a
> deferred_probe_work_func+0xd4/0xee
> process_one_work+0x1e0/0x534
> worker_thread+0x166/0x2cc
> kthread+0xc4/0xe0
> ret_from_fork+0xe/0x18
>
> -> #0 (console_owner){-.-.}-{0:0}:
> check_noncircular+0x10e/0x122
> __lock_acquire+0x105c/0x1f4a
> lock_acquire.part.0+0xa2/0x1d4
> lock_acquire+0x44/0x5a
> console_lock_spinning_enable+0x58/0x60
> console_flush_all+0x2cc/0x3c8
> console_unlock+0x80/0x1a8
> vprintk_emit+0x10e/0x2e0
> dev_vprintk_emit+0xea/0x112
> dev_printk_emit+0x2e/0x48
> __dev_printk+0x40/0x5c
> _dev_warn+0x46/0x60
> pm_runtime_enable+0x98/0xb6
> starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
> platform_probe+0x4e/0x92
> really_probe+0x10a/0x2da
> __driver_probe_device.part.0+0xb2/0xe8
> driver_probe_device+0x78/0xc4
> __driver_attach+0x54/0x162
> bus_for_each_dev+0x58/0xa4
> driver_attach+0x1a/0x22
> bus_add_driver+0xec/0x1ce
> driver_register+0x3e/0xd8
> __platform_driver_register+0x1c/0x24
> starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
> do_one_initcall+0x5e/0x28c
> do_init_module+0x52/0x1ba
> load_module+0x1440/0x18f0
> init_module_from_file+0x76/0xae
> idempotent_init_module+0x18c/0x24a
> __riscv_sys_finit_module+0x52/0x82
> do_trap_ecall_u+0x1b6/0x1e2
> _new_vmalloc_restore_context_a0+0xc2/0xce
>
> other info that might help us debug this:
>
> Chain exists of:
> console_owner --> &port_lock_key --> &dev->power.lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&dev->power.lock);
> lock(&port_lock_key);
> lock(&dev->power.lock);
> lock(console_owner);
>
> *** DEADLOCK ***
>
> 4 locks held by systemd-udevd/159:
> #0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at:
> __driver_attach+0x4c/0x162
> #1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
> pm_runtime_enable+0x1e/0xb6
> #2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at:
> dev_vprintk_emit+0xea/0x112
> #3: ffffffff81822448 (console_srcu){....}-{0:0}, at:
> console_flush_all+0x4e/0x3c8
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #15438
> Hardware name: StarFive VisionFive 2 v1.2A (DT)
> Call Trace:
> [<ffffffff80006a02>] dump_backtrace+0x1c/0x24
> [<ffffffff80b70b3e>] show_stack+0x2c/0x38
> [<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4
> [<ffffffff80b7f946>] dump_stack+0x14/0x1c
> [<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350
> [<ffffffff8007fd76>] check_noncircular+0x10e/0x122
> [<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a
> [<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4
> [<ffffffff800842be>] lock_acquire+0x44/0x5a
> [<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60
> [<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8
> [<ffffffff8008c23e>] console_unlock+0x80/0x1a8
> [<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0
> [<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112
> [<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48
> [<ffffffff80b7a006>] __dev_printk+0x40/0x5c
> [<ffffffff80b7a2be>] _dev_warn+0x46/0x60
> [<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6
> [<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
> [<ffffffff806f83f6>] platform_probe+0x4e/0x92
> [<ffffffff80b7a680>] really_probe+0x10a/0x2da
> [<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8
> [<ffffffff806f6112>] driver_probe_device+0x78/0xc4
> [<ffffffff806f6278>] __driver_attach+0x54/0x162
> [<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4
> [<ffffffff806f5c9e>] driver_attach+0x1a/0x22
> [<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce
> [<ffffffff806f7112>] driver_register+0x3e/0xd8
> [<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24
> [<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
> [<ffffffff800027ba>] do_one_initcall+0x5e/0x28c
> [<ffffffff800bb4d8>] do_init_module+0x52/0x1ba
> [<ffffffff800bcc8a>] load_module+0x1440/0x18f0
> [<ffffffff800bd2f0>] init_module_from_file+0x76/0xae
> [<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a
> [<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82
> [<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2
> [<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce
>
> Fix this issue by enabling starfive pcie controller device's PM runtime
> status before calling into pci_host_probe() through plda_pcie_host_init().
>
> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Fixes: 02787a3b4d10 ("PCI/PM: Enable runtime power management for host bridges")
> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Signed-off-by: Mayank Rana <quic_mrana@xxxxxxxxxxx>
> ---
> drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c
> index 0567ec373a3e..e73c1b7bc8ef 100644
> --- a/drivers/pci/controller/plda/pcie-starfive.c
> +++ b/drivers/pci/controller/plda/pcie-starfive.c
> @@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev)
> if (ret)
> return ret;
>
> + pm_runtime_enable(&pdev->dev);
> + pm_runtime_get_sync(&pdev->dev);
> +
> plda->host_ops = &sf_host_ops;
> plda->num_events = PLDA_MAX_EVENT_NUM;
> /* mask doorbell event */
> @@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev)
> plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS;
> ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops,
> &stf_pcie_event);
> - if (ret)
> + if (ret) {
> + pm_runtime_put_sync(&pdev->dev);
> + pm_runtime_disable(&pdev->dev);
> return ret;
> + }
>
> - pm_runtime_enable(&pdev->dev);
> - pm_runtime_get_sync(&pdev->dev);
> platform_set_drvdata(pdev, pcie);
>
> return 0;
> --
> 2.25.1
>