Re: [PATCH 12/13] wifi: mt76: mt7925: fix ROC deadlocks and race conditions
From: Felix Fietkau
Date: Tue Jan 27 2026 - 06:06:21 EST
On 20.01.26 21:10, Zac wrote:
From: Zac Bowling <zac@xxxxxxxxxxxxxx>
Fix multiple interrelated issues in the remain-on-channel (ROC) handling
that cause deadlocks, race conditions, and resource leaks.
Problems fixed:
1. Deadlock in sta removal ROC abort path:
When a station is removed while a ROC operation is in progress, the
driver would call mt7925_roc_abort_sync() which waits for ROC completion.
However, the ROC work itself needs to acquire mt792x_mutex which is
already held during station removal, causing a deadlock.
Fix: Use async ROC abort (mt76_connac_mcu_abort_roc) when called from
paths that already hold the mutex, and add MT76_STATE_ROC_ABORT flag
to coordinate between the abort and the ROC timer.
2. ROC timer race during suspend:
The ROC timer could fire after the device started suspending but before
the ROC was properly aborted, causing undefined behavior.
Fix: Delete ROC timer synchronously before suspend and check device
state before processing ROC timeout.
3. ROC rate limiting for MLO auth failures:
Rapid ROC requests during MLO authentication can overwhelm the firmware,
causing authentication timeouts. The MT7925 firmware has limited ROC
handling capacity.
Fix: Add rate limiting infrastructure with configurable minimum interval
between ROC requests. Track last ROC completion time and defer new
requests if they arrive too quickly.
4. WCID leak in ROC cleanup:
When ROC operations are aborted, the associated WCID resources were
not being properly released, causing resource exhaustion over time.
Fix: Ensure WCID cleanup happens in all ROC termination paths.
5. Async ROC abort race condition:
The async ROC abort could race with normal ROC completion, causing
double-free or use-after-free of ROC resources.
Fix: Use MT76_STATE_ROC_ABORT flag and proper synchronization to
prevent races between async abort and normal completion paths.
These fixes work together to provide robust ROC handling that doesn't
deadlock, properly releases resources, and handles edge cases during
suspend and MLO operations.
Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 device")
Signed-off-by: Zac Bowling <zac@xxxxxxxxxxxxxx>
The rate limiting code seems a bit suspicious to me.
What does "limited ROC handling capacity" mean? Outstanding ROC requests? Does it need time to settle after a completed ROC?
This needs to be clarified and likely replaced with a more targeted fix.
- Felix