Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: LB F

Date: Sat Mar 28 2026 - 17:32:40 EST


Following up on the corrected patch — I tried to trace the RX path
myself to understand the full picture. I am not a developer
and may be misreading the code, so please take this with a grain
of salt. But I thought some of these observations might be useful.

---

Tracing the RX path from DMA to crash
--------------------------------------

In rtw_pci_rx_napi() (pci.c), each frame from the DMA ring is
processed like this:

1. rtw_pci_dma_check() — compares rx_tag, but only
warns on mismatch, does not
skip the frame (pci.c:696)

2. dma_sync_single_for_cpu() — syncs 11478 bytes
(RTK_PCI_RX_BUF_SIZE) from
device to CPU

3. rtw_rx_query_rx_desc() — parses all RX descriptor
fields from W0..W5 with no
validation (rx.c:305-325):

pkt_len = W0[13:0] range 0..16383
drv_info_sz = W0[19:16] range 0..15, then *8 = 0..120
shift = W0[25:24] range 0..3
physt = W0[26] 0 or 1
is_c2h = W2[28] 0 or 1

None of these fields are checked against expected values.

4. pkt_offset = 24 + drv_info_sz + shift
With garbage, this can be up to 24 + 120 + 3 = 147.

5. new_len = pkt_len + pkt_offset (pci.c:1088)
With garbage, this can be up to 16383 + 147 = 16530,
which exceeds RTK_PCI_RX_BUF_SIZE (11478).
skb_put_data() then copies new_len bytes from the DMA
buffer — potentially reading past the end.

6. If is_c2h == 1 (from garbage W2 bit 28), the frame goes
to rtw_fw_c2h_cmd_rx_irqsafe() (pci.c:1096-1097).

In rtw_fw_c2h_cmd_rx_irqsafe() (fw.c:351):

7. c2h = skb->data + pkt_offset
c2h->id is simply read from that offset — a random byte
from garbage data. No validation against known C2H IDs.

8. If c2h->id is not C2H_BT_MP_INFO, C2H_WLAN_RFON, or
C2H_SCAN_RESULT, the skb goes to c2h_queue for deferred
processing via the default case (fw.c:377-381).

In rtw_c2h_work() -> rtw_fw_c2h_cmd_handle() (fw.c:302):

9. mutex_lock(&rtwdev->mutex)
c2h->id is matched against the switch cases.
If it happens to be 0x37 (C2H_ADAPTIVITY):
rtw_fw_adaptivity_result() dereferences
rtwdev->chip->edcca_th, which is NULL for RTL8821C.
Kernel oops. Mutex never unlocked.

So the crash is probabilistic — it requires a garbage frame
where W2 bit 28 is 1 (is_c2h) AND the byte at pkt_offset
happens to be 0x37. This explains why not every burst of
corrupted frames results in a crash.

---

Concrete example with a captured dump
--------------------------------------

Taking the "page 2" dump with MAC addresses:

00000000: 88 55 51 95 d1 66 ad 50 2f 25 3f 89 ae 35 ef 77

W0 (bytes 0-3, little-endian) = 0x95515588
pkt_len = 0x588 = 1416
drv_info_sz = 0x5 -> *8 = 40
shift = 1
physt = 1

W2 (bytes 8-11, little-endian) = 0x893f252f
is_c2h = bit 28 = (0x893f252f >> 28) = 0x8 -> bit 0 = 0
(In this particular frame is_c2h = 0, so no C2H path.)

But drv_info_sz = 5 (should be 0 or 4 per your observation),
confirming the frame is corrupted.

---

pci bus timeout
---------------

I checked all our saved logs across 29 boots and 41 resume
cycles: zero "pci bus timeout" messages anywhere. This means
rtw_pci_dma_check() never detects a rx_tag mismatch — the
buffer descriptor passes validation, but the buffer content
is corrupted. So the corruption seems to happen at a level
that rx_tag does not catch.

---

I also noticed that new_len is not bounds-checked against
RTK_PCI_RX_BUF_SIZE before the skb_put_data() copy
(pci.c:1088-1094), which might be worth looking at
independently of this bug.

Again, I'm sure you will see things I've missed. Happy to
test anything.

Best regards,
Oleksandr Havrylov