Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: LB F

Date: Sun Apr 26 2026 - 20:27:46 EST


Hi Bitterblue and Panagiotis,

A quick follow-up to my preliminary report from earlier today.

After ensuring the new module containing the `drv_info_sz` validation
patch was properly loaded via DKMS, I ran a much more aggressive
stress test suite. During a test where I rapidly toggled power saving
(`iw dev wlp19s0 set power_save on/off`) while concurrently
downloading a 10GB file, the hardware bug finally triggered.

The resulting logs revealed a fascinating edge case about the
corruption pattern:

1. The hardware emitted a corrupted RX DMA burst.
2. Interestingly, the `pr_err_once("drv_info_sz %d\n", ...)` trap from
your new patch did NOT trigger. This indicates that the corrupted
descriptor happened to have a `drv_info_sz` that exactly matched 24
(`PHY_STATUS_SIZE`) (or 0) purely by coincidence, bypassing the
validation check.
3. Because the descriptor bypassed the length check, the driver
considered it valid and handed it over to the mac80211 stack via NAPI.
The mac80211 stack immediately choked on the corrupted frame structure
and threw a WARNING.
4. About 1.3 seconds later, the subsequent garbage in the hardware
burst reached the PHY status processing logic, where Panagiotis's
patch stepped in.

Here is the exact kernel trace proving this sequence:

[ 1080.394531] WARNING: net/mac80211/rx.c:896 at
ieee80211_rx_list+0x8bd/0xf10 [mac80211], CPU#3: irq/51-rtw_pci/519
[ 1080.394802] CPU: 3 UID: 0 PID: 519 Comm: irq/51-rtw_pci Tainted: G
IOE 6.19.12-1-default #1
[ 1080.394814] RIP: 0010:ieee80211_rx_list+0x8bd/0xf10 [mac80211]
[ 1080.394921] Call Trace:
[ 1080.394924] <IRQ>
[ 1080.394941] ieee80211_rx_napi+0x55/0xe0 [mac80211]
[ 1080.395025] rtw_pci_rx_napi+0x269/0x360 [rtw_pci]
[ 1080.395038] rtw_pci_napi_poll+0x5b/0x110 [rtw_pci]
[ 1080.395044] __napi_poll+0x30/0x1e0
[ 1080.395050] net_rx_action+0x2ec/0x380
[ 1080.395063] handle_softirqs+0xcd/0x270
[ 1080.395068] do_softirq.part.0+0x3b/0x60
[ 1080.395073] </IRQ>
...
[ 1081.708220] rtw_8821ce 0000:13:00.0: unused phy status page (10)

Crucially, because Panagiotis's patch (`if (!edcca_th) return;`) was
active, the system cleanly dropped the PHY status garbage at [
1081.708220 ] instead of hitting the NULL pointer dereference.

The system survived perfectly. No Kernel Panic occurred, and the
driver recovered instantly and continued the 10GB download without
dropping the connection.

Conclusion:
This proves that checking `drv_info_sz` is an excellent primary
defense, but it is not 100% bulletproof because random hardware
garbage can coincidentally have a "valid" length. Panagiotis's NULL
check patch remains an absolutely vital secondary layer of defense
that safely neutralizes the garbage that inevitably slips past the
length validation.

Together, these two patches create a remarkably robust shield against
this hardware defect. The system is incredibly stable now.

Thank you both for your brilliant work on this!

Best regards,
Oleksandr Havrylov