Re: [PATCH] wifi: ath12k: fix CMA error and MHI state mismatch during resume

From: Jayasaikiran Banigallapati

Date: Tue Feb 03 2026 - 00:54:49 EST



On 2/3/26 11:00, Baochen Qiang wrote:

On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
On 2/3/26 08:21, Baochen Qiang wrote:
On 2/2/2026 11:17 PM, Saikiran wrote:
Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
system suspend/resume support but caused a critical regression where
CMA pages are corrupted during resume.

1. CMA page corruption:
    Calling mhi_unprepare_after_power_down() during suspend (via
    ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
    DMA buffers. When these pages are accessed during resume, the kernel
    detects corruption (Bad page state).
How, FBC image and RDDM image get re-allocated at resume, no?

To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().

The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
free the pages:

 mhi_free_bhie_table+0x50/0xa0 [mhi]
 mhi_unprepare_after_power_down+0x30/0x70 [mhi]
 ath12k_mhi_stop+0xf8/0x210 [ath12k]
 ath12k_core_suspend_late+0x94/0xc0 [ath12k]

The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
rddm_image). This suggests that something is still holding a reference to these pages
when DEINIT attempts to free them, causing the kernel to panic before we reach the
resume stage.
this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
ath12k

Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
suspend.
first of all, this is a workaround rather than fix. Ideally we should try to root cause
the issue and fix it in the right way.


The original comment in existing code:


/* During suspend we need to use mhi_power_down_keep_dev()
 * workaround, otherwise ath12k_core_resume() will timeout
 * during resume.
 */

This patch aligns the code with this existing intent. The driver was previously

calling DEINIT (and freeing resources) despite the comment advising to use keep_dev.

If the intention of the driver authors was to use keep_dev for suspend,

then my understanding is DEINIT is incorrect here (Correct me if I am wrong)

regardless of the underlying MM behavior.


Secondly the workaround here seems problematic: you skip INIT druing resume. However note
several hardware registers need to be re-programmed during this stage, how could the
target work if its power is cutoff during suspend and the register context is not restored
during resume?


In my testing, WiFi functionality was fully restored after resume.

The device associates and passes traffic immediately.

My understanding is that:

ATH12K_MHI_INIT primarily handles host memory allocation (which we preserved by skipping DEINIT).

ATH12K_MHI_POWER_ON calls mhi_sync_power_up(). This function triggers the MHI state machine,

which handles the necessary BHI/BHIE programming and firmware download (SBL) sequence.

Since mhi_sync_power_up() is still called during resume, the target is correctly re-initialized and

registers are programmed, even if we skip the redundant host memory allocation step (INIT).

Thanks & Regards,
Saikiran