Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

From: Paolo Abeni

Date: Thu May 28 2026 - 05:28:11 EST


On 5/25/26 5:13 AM, Tim JH Chen wrote:
> v2: Address two concerns raised in AI-assisted code review of v1:
>
> 1. [High] t7xx_dpmaif_resume() was unconditionally restoring state to
> DPMAIF_STATE_PWRON regardless of the state before suspend. If the
> modem had already been moved to DPMAIF_STATE_PWROFF by
> t7xx_dpmaif_md_state_callback() (MD_STATE_EXCEPTION or
> MD_STATE_STOPPED) prior to system suspend, resume would incorrectly
> re-arm the TX kthread guard, allowing TX HW writes against a modem
> the MD state machine considers stopped or in exception.
>
> Fix: save dpmaif_ctrl->state into pre_suspend_state at the start of
> t7xx_dpmaif_suspend() and restore that saved value in
> t7xx_dpmaif_resume(), so a pre-suspend PWROFF is preserved across
> the suspend/resume cycle.
>
> 2. [Medium] The v1 second state check before pm_runtime_resume_and_get()
> only narrowed the TOCTOU window -- it did not close it. The state
> field was a plain enum read and written without any lock or
> READ_ONCE/WRITE_ONCE annotation. After the check passed on one CPU,
> the suspend path on another CPU could still set state=PWROFF and
> begin PM teardown before the kthread reached pm_runtime_resume_and_get(),
> reproducing the deadlock.
>
> Fix: introduce tx_pm_lock (struct mutex) held by the kthread across
> the [state check -> pm_runtime_resume_and_get -> pm_runtime_put]
> sequence. t7xx_dpmaif_suspend() acquires this lock before setting
> DPMAIF_STATE_PWROFF, which serialises with any in-progress kthread
> PM section and guarantees the kthread cannot enter
> pm_runtime_resume_and_get() after the state flag is set.
> READ_ONCE/WRITE_ONCE are added at every access point of the state
> flag that crosses the suspend/resume boundary to prevent
> compiler-visible tearing.
>
> The original v1 description of the root cause and tested fix still
> applies (deadlock between t7xx_dpmaif_tx_hw_push_thread calling
> pm_runtime_resume_and_get() and the system PM suspend path, triggered
> with ASPM L1 enabled after repeated suspend/resume cycles).
>
> Tested: no soft lockup over 500+ suspend/resume cycles with SIM
> registered and ASPM L1 enabled (previously triggered in < 300).
>
> Fixes: 05f7e89ab ("Linux 6.19")
> Signed-off-by: Tim JH Chen <tim.jh.chen@xxxxxxxxxx>

Please have a much more better read of:

Documentation/process/

especially:

Documentation/process/maintainer-netdev.rst

before your next submission, because this one is still lacking in many ways:

- subj prefix must include the target tree (net) and a revision number
(for the next iteration: v3)
- fixes tag should point to the commit actually introducing the bug
- the commit message should describe the issue and the fix, alike v1,
any changelog-related information (~all the above) should land after the
tag area and a '---' separator.

Also sashiko has still quite a bit of concerns:

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260525031320.519435-1-tim.jh.chen%40wnc.com.tw

and many of them look real.

/P