Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: LB F

Date: Mon Mar 30 2026 - 07:34:52 EST

Hi Ping-Ke,

> Oleksandr, is it possible to sum up the conditions these weird frames
> happened? such as enter LPS? with BT devices? or something else.

To be completely honest, I cannot point to a single definitive trigger.
But I went through my kernel logs very carefully and here is what I
found — I'll present just the objective data and let you draw your
own conclusions.

== System context ==

This is a WiFi+BT combo chip (RTL8821CE). Bluetooth is active most of the time —
I constantly use a Soundcore Q10i headset (A2DP audio streaming + AVRCP).
I also use hibernation (suspend-to-disk, S4) frequently. LPS_DEEP is
disabled via the DMI quirk.

== Corrupted frame distribution ==

In one boot session I observed 310 "unused phy status page" events.
They were NOT evenly distributed — they appeared in 3 distinct bursts
separated by hours of clean operation:

Cluster #1: 00:21 — 00:38 50 frames over ~17 minutes (gradual)
Cluster #2: 01:39 120 frames in ~2 seconds (explosive)
Cluster #3: 12:23 — 12:26 140 frames over ~3 minutes

Minute-by-minute breakdown:

00:21 1 01:39 120 12:23 48
00:32 3 12:24 25
00:33 4 12:25 61
00:34 12 12:26 6
00:35 5
00:36 6
00:37 6
00:38 13

== Full timeline of key kernel events ==

18:46:40 Cold boot (Linux 6.19.10-1-cachyos)
18:46:49 rtw_8821ce: Firmware version 24.11.0, H2C version 12
18:47:xx wlan0 associated with AP

19:22:40 Hibernation resume #1
19:22:51 wlan0 re-associated

20:09:35 wlan0 deauthenticating (entering hibernation)
20:11:23 Hibernation resume #2
20:11:33 wlan0 re-associated

20:55:01 Bluetooth: hci0: unexpected event for opcode 0xfc19
22:42:25 input: Soundcore Q10i (AVRCP) registered

[ No other kernel events besides atkbd key events until: ]

00:21:20 >>> CLUSTER #1 STARTS (first corrupted frame)
4h10m after resume #2, 1h39m after BT AVRCP event
00:37:35 WARNING: net/mac80211/rx.c:896 (mac80211 WARN_ON triggered)
00:38:59 Cluster #1 ends

00:47:45 Chrome SharedWorker trap (unrelated userspace crash)

01:39:33 >>> CLUSTER #2 STARTS (120 frames in ~2 seconds)
Also logged: "unknown pkt rate = 41" (0x41 = 65 decimal,
far exceeding DESC_RATE_MAX — confirms completely garbled
RX descriptor)
01:39:34 Cluster #2 ends

[ ~10 hours of clean operation / hibernation ]

12:21:29 Hibernation resume #3
12:21:30 Bluetooth RTL firmware reloaded (rtl8821c_fw.bin)
12:21:40 wlan0 re-associated

12:23:14 >>> CLUSTER #3 STARTS
Only 94 seconds after WiFi re-association post-resume!
12:26:09 Cluster #3 ends

== Observations (presented carefully, without definitive conclusions) ==

1. The corrupted frames come in BURSTS, not continuously. Between
bursts the adapter works normally for hours.

2. Cluster #3 has a clear temporal correlation with hibernation
resume — bad frames started only 94 seconds after wlan0
re-associated. This is the tightest correlation in the data.

3. However, Clusters #1 and #2 started approximately 4h and 5.5h
after the preceding resume (#2), so hibernation alone does not
explain everything. Something may be accumulating over time.

4. The BT subsystem logged "unexpected event for opcode 0xfc19"
(a vendor-specific RTL HCI command) at 20:55, roughly 1.5 hours
before Cluster #1. I don't know if this event is normal or
indicates a firmware anomaly on the combo chip.

5. The bursts vary dramatically in intensity: Cluster #2 produced
120 frames in 2 seconds, while Cluster #1 was spread over 17
minutes. This suggests different failure modes within the chip.

6. Between resume #2 and Cluster #1, the ONLY non-keyboard kernel
events were the BT unexpected event (20:55) and BT AVRCP input
device registration for the headset (22:42). No PCIe events,
no driver restarts, no suspend entries.

== Questions ==

Could you advise on how to investigate this further? For example:

- Is there a debug flag or register dump we could capture right
before the first corrupted frame in a burst?
- Would it help to log C2H (chip-to-host) traffic around the
time of these events?

I am ready to run any specific tests you need. In the meantime,
I agree that filtering by DRV_INFO_SIZE is the right practical
solution, and I'm waiting for your official patch to test locally.

Best regards,
Oleksandr Havrylov