Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
From: Paolo Abeni
Date: Thu May 21 2026 - 06:54:44 EST
From: AI Reviewer <ai@xxxxxxxxxxx>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: wwan: t7xx: fix race between TX thread and system PM suspend
This patch addresses a soft lockup observed when system suspend races with
the DPMAIF TX kthread. It updates dpmaif_ctrl->state in the suspend and
resume callbacks and adds a second state check in
t7xx_dpmaif_tx_hw_push_thread() before pm_runtime_resume_and_get().
> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> index 7ff33c1d6ac7..315a77e2465a 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> @@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
> if (!dpmaif_ctrl)
> return 0;
>
> + dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
> t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
> t7xx_dpmaif_enable_irq(dpmaif_ctrl);
> t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
[High]
Is the unconditional assignment to DPMAIF_STATE_PWRON safe when the modem
was not running before suspend?
Prior to this patch neither suspend nor resume modified dpmaif_ctrl->state,
so a modem that had been moved to DPMAIF_STATE_PWROFF via
t7xx_dpmaif_md_state_callback() (for MD_STATE_STOPPED or MD_STATE_EXCEPTION,
which calls t7xx_dpmaif_stop()) stayed PWROFF across a suspend/resume cycle.
After the patch t7xx_dpmaif_resume() forces state to PWRON and also
restores que_started=true via t7xx_dpmaif_start_txrx_qs(). The TX kthread's
state guard then permits it to proceed:
if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
continue;
ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
...
t7xx_do_tx_hw_push(dpmaif_ctrl);
Could this cause TX HW writes against a modem that the MD state machine
considers stopped or in exception?
The commit message describes the resume change as "symmetric with the
suspend change", but the suspend transition (PWRON -> PWROFF) tracks an
actual HW teardown, while the resume transition forces PWRON regardless of
the prior MD-driven state. Should the resume path only restore PWRON when
the modem was PWRON before suspend?
> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> index 236d632cf591..d5a5befec71e 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> @@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
> break;
> }
>
> + if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
> + continue;
> +
> ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
> if (ret < 0 && ret != -EACCES)
> return ret;
[Medium]
Does this new check actually close the TOCTOU window described in the
commit message, or does it just narrow it?
dpmaif_ctrl->state is a plain enum read here without any lock,
READ_ONCE/WRITE_ONCE, or memory barrier. It is also written without
locking by t7xx_dpmaif_suspend(), t7xx_dpmaif_resume(), t7xx_dpmaif_start(),
and t7xx_dpmaif_stop() (via t7xx_dpmaif_md_state_callback()), and read
without locking from t7xx_do_tx_hw_push() and the ISR path.
After this new check passes, can the system PM suspend path still set
state = DPMAIF_STATE_PWROFF on another CPU and grab the PM spinlock before
this thread reaches pm_runtime_resume_and_get(), reproducing the same
deadlock the patch is intended to fix?
The commit message states:
add a second state check immediately before
pm_runtime_resume_and_get() to close the TOCTOU window between
the wait_event guard and the pm call.
Given that the window between the new check and the pm_runtime call is
non-zero and the flag has no synchronization, would it be more accurate to
say the window is reduced rather than closed?
For reference, Documentation-style guidance on flags-as-locks (the gate
guard pattern of reading a flag and then operating on shared state without
holding a lock) suggests using a real mutex/spinlock or a completion that
suspend waits on, so the kthread's PM reference is guaranteed to be
dropped before the suspend path runs. Would something along those lines be
a better fit here?
--
This is an AI-generated review.