Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
From: LB F
Date: Thu Mar 12 2026 - 20:04:03 EST
Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
> I'm really not sure how/why kernel becomes frozen. As I mentioned before
> it might because of received malformed data and no complete validation
> before reporting RX packet to mac80211.
> Not sure if you can try to dig and add some validation?
Hi Ping-Ke,
I took your advice and performed a deeper audit of the rtw88 PCI implementation,
focusing on both validation and concurrency. While the RX gaps I previously
mentioned are real, I found two critical architectural issues in the TX path
that likely contribute to the "hard freezes" and DMA stalls we've seen.
1. Concurrency: TX Descriptor Management Race (pci.c:836)
---------------------------------------------------------
In rtw_pci_tx_write_data(), rtw88 fetches the descriptor address based on
the current write pointer (wp) BEFORE acquiring the irq_lock:
```c
/* drivers/net/wireless/realtek/rtw88/pci.c:836 */
buf_desc = get_tx_buffer_desc(ring, tx_buf_desc_sz);
memset(buf_desc, 0, tx_buf_desc_sz);
/* ... packets are filled ... */
spin_lock_bh(&rtwpci->irq_lock); // [!] Lock is taken too late
```
Since mac80211 can call rtw_ops_tx and rtw_ops_wake_tx_queue (the latter
calling __rtw_tx_work) concurrently on different CPUs—especially for
high-priority AC_VO traffic—two threads can fetch the same wp for the
same queue simultaneously.
Result: CPU 0 prepares data in slot [N], while CPU 1 simultaneously zeros out
or overwrites slot [N]. This explains why we see intermittent descriptor
corruption and subsequent DMA/firmware hangs.
2. Synchronization: Missing DMA Memory Barrier (pci.c:786)
----------------------------------------------------------
In rtw_pci_tx_kick_off_queue(), the doorbell is hit without a memory barrier:
```c
/* drivers/net/wireless/realtek/rtw88/pci.c:786 */
rtw_write16(rtwdev, bd_idx, ring->r.wp & TRX_BD_IDX_MASK);
```
For PCIe DMA, it is vital to ensure descriptor RAM writes are visible to
the device before the MMIO register doorbell hits. Standard Linux practice
usually dictates a wmb() here. Without it, the Wi-Fi controller may read
stale or uninitialized memory, leading to the "failed to leave lps state"
timeouts and H2C command failures we've logged.
3. Confirmed RX Limit Mismatch (rtw8821c.c:254)
-----------------------------------------------
I verified that the hardware is explicitly programmed with a 12KB limit:
```c
/* drivers/net/wireless/realtek/rtw88/rtw8821c.c:254 */
rtw_write8(rtwdev, REG_RX_PKT_LIMIT, WLAN_RX_PKT_LIMIT_512);
```
Since the driver's RX buffer (RTK_PCI_RX_BUF_SIZE) is only 11.2KB, any
malformed or large packet will result in an OOB read in rtw_pci_rx_napi().
I believe addressing these three points (TX locking, TX barriers, and
RX buffer consistency) would significantly harden the driver against
the stability issues reported in Bug 221195.
Best regards,
Oleksandr Havrylov