[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
From: Tim JH Chen
Date: Mon May 18 2026 - 04:05:20 EST
When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.
The root cause is two-fold:
1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
the TX work-queue items (by clearing txq->que_started and waiting on
txq->tx_processing). It does NOT signal the kthread and does NOT
update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.
2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
is only checked at the top of each loop iteration. If the thread
already passed this guard, it proceeds unconditionally to call
pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
also held (or contended) by the system PM suspend path.
The result is a spinlock deadlock observed as:
watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
RIP: _raw_spin_unlock_irqrestore
Call Trace:
__pm_runtime_resume+0x5b/0x80
t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]
The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds the PM lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.
Fix by three coordinated changes:
- In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
after stopping the TX queue, then call wake_up() so any sleeping thread
re-evaluates the wait_event condition and stops.
- In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
re-enabling the TX queues, symmetric with the suspend change.
Without this the kthread would never wake up after resume.
- In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
immediately before pm_runtime_resume_and_get() to close the TOCTOU
window between the wait_event guard and the pm call.
Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).
Signed-off-by: Tim JH Chen <tim.jh.chen@xxxxxxxxxx>
---
drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c | 3 +++
drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6..315a77e24 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,8 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
struct dpmaif_ctrl *dpmaif_ctrl = param;
t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+ dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+ wake_up(&dpmaif_ctrl->tx_wq);
t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
if (!dpmaif_ctrl)
return 0;
+ dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
t7xx_dpmaif_enable_irq(dpmaif_ctrl);
t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf..d5a5befec 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
break;
}
+ if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
+ continue;
+
ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
if (ret < 0 && ret != -EACCES)
return ret;
--
2.25.1