Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
From: LB F
Date: Sat Mar 28 2026 - 17:32:40 EST
Following up on the corrected patch — I tried to trace the RX path
myself to understand the full picture. I am not a developer
and may be misreading the code, so please take this with a grain
of salt. But I thought some of these observations might be useful.
---
Tracing the RX path from DMA to crash
--------------------------------------
In rtw_pci_rx_napi() (pci.c), each frame from the DMA ring is
processed like this:
1. rtw_pci_dma_check() — compares rx_tag, but only
warns on mismatch, does not
skip the frame (pci.c:696)
2. dma_sync_single_for_cpu() — syncs 11478 bytes
(RTK_PCI_RX_BUF_SIZE) from
device to CPU
3. rtw_rx_query_rx_desc() — parses all RX descriptor
fields from W0..W5 with no
validation (rx.c:305-325):
pkt_len = W0[13:0] range 0..16383
drv_info_sz = W0[19:16] range 0..15, then *8 = 0..120
shift = W0[25:24] range 0..3
physt = W0[26] 0 or 1
is_c2h = W2[28] 0 or 1
None of these fields are checked against expected values.
4. pkt_offset = 24 + drv_info_sz + shift
With garbage, this can be up to 24 + 120 + 3 = 147.
5. new_len = pkt_len + pkt_offset (pci.c:1088)
With garbage, this can be up to 16383 + 147 = 16530,
which exceeds RTK_PCI_RX_BUF_SIZE (11478).
skb_put_data() then copies new_len bytes from the DMA
buffer — potentially reading past the end.
6. If is_c2h == 1 (from garbage W2 bit 28), the frame goes
to rtw_fw_c2h_cmd_rx_irqsafe() (pci.c:1096-1097).
In rtw_fw_c2h_cmd_rx_irqsafe() (fw.c:351):
7. c2h = skb->data + pkt_offset
c2h->id is simply read from that offset — a random byte
from garbage data. No validation against known C2H IDs.
8. If c2h->id is not C2H_BT_MP_INFO, C2H_WLAN_RFON, or
C2H_SCAN_RESULT, the skb goes to c2h_queue for deferred
processing via the default case (fw.c:377-381).
In rtw_c2h_work() -> rtw_fw_c2h_cmd_handle() (fw.c:302):
9. mutex_lock(&rtwdev->mutex)
c2h->id is matched against the switch cases.
If it happens to be 0x37 (C2H_ADAPTIVITY):
rtw_fw_adaptivity_result() dereferences
rtwdev->chip->edcca_th, which is NULL for RTL8821C.
Kernel oops. Mutex never unlocked.
So the crash is probabilistic — it requires a garbage frame
where W2 bit 28 is 1 (is_c2h) AND the byte at pkt_offset
happens to be 0x37. This explains why not every burst of
corrupted frames results in a crash.
---
Concrete example with a captured dump
--------------------------------------
Taking the "page 2" dump with MAC addresses:
00000000: 88 55 51 95 d1 66 ad 50 2f 25 3f 89 ae 35 ef 77
W0 (bytes 0-3, little-endian) = 0x95515588
pkt_len = 0x588 = 1416
drv_info_sz = 0x5 -> *8 = 40
shift = 1
physt = 1
W2 (bytes 8-11, little-endian) = 0x893f252f
is_c2h = bit 28 = (0x893f252f >> 28) = 0x8 -> bit 0 = 0
(In this particular frame is_c2h = 0, so no C2H path.)
But drv_info_sz = 5 (should be 0 or 4 per your observation),
confirming the frame is corrupted.
---
pci bus timeout
---------------
I checked all our saved logs across 29 boots and 41 resume
cycles: zero "pci bus timeout" messages anywhere. This means
rtw_pci_dma_check() never detects a rx_tag mismatch — the
buffer descriptor passes validation, but the buffer content
is corrupted. So the corruption seems to happen at a level
that rx_tag does not catch.
---
I also noticed that new_len is not bounds-checked against
RTK_PCI_RX_BUF_SIZE before the skb_put_data() copy
(pci.c:1088-1094), which might be worth looking at
independently of this bug.
Again, I'm sure you will see things I've missed. Happy to
test anything.
Best regards,
Oleksandr Havrylov