Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: LB F

Date: Sat Mar 28 2026 - 07:42:37 EST


Hi Bitterblue,

Apologies for the delayed response. I applied your diagnostic patch
right away but held off on replying because the NULL pointer crash
has not reproduced since — it has been over 36 hours now with no
oops, which is unusual (previously it occurred in 4 out of 7 boots,
typically within 2 minutes to 24 hours).

I wanted to wait and collect the hex dumps from the crash-time burst
(the 50+ "unused phy status page" events that always preceded the
oops), as those would be the most valuable. Unfortunately, the crash
hasn't happened yet during this session. If/when it does, I will
follow up immediately with those dumps.

In the meantime, here is what I have so far. The patch is working
and producing output. I collected 76 "unused phy status page" events
during this boot, with the following time distribution:

14:01 1 event (isolated)
16:33 1 event
16:57-17:00 73 events (burst over ~3 minutes, no crash followed)
00:03 1 event (isolated)

Page number distribution (no page 0 or 1, all are "garbage" pages):

page 10: 10 page 7: 8 page 8: 7 page 13: 7
page 11: 7 page 9: 6 page 15: 6 page 12: 6
page 4: 5 page 2: 5 page 14: 4 page 5: 2
page 3: 2 page 6: 1

Here are representative hex dumps. I'm showing the byte-level dump
(second print_hex_dump) since it is easier to read:

Isolated event (page 9):

rtw_8821ce 0000:13:00.0: unused phy status page (9)
00000000: c7 5e 9c 9d 91 69 4d dc b0 67 c2 09 84 33 00 00 .^...iM..g...3..
00000010: 00 1e fe 3f cf f2 f0 08 01 29 00 00 00 11 2a 01 ...?.....)....*.
00000020: 0e 00 00 00 00 00 00 20 .......

Burst event (page 14):

rtw_8821ce 0000:13:00.0: unused phy status page (14)
00000000: bd 2c e0 3d 00 00 00 11 87 0a 40 80 88 33 00 00 .,.=......@..3..
00000010: 00 1e fe 3f 3e b6 9b 44 01 2e 00 00 00 11 2a 01 ...?>..D......*.
00000020: 20 00 00 00 00 00 00 20 ......

Burst event (page 12) — byte 0x10 is 0x7e instead of usual 0x00:

rtw_8821ce 0000:13:00.0: unused phy status page (12)
00000000: 1c b3 7f 15 d1 94 95 7e 70 5e f4 e3 b4 a1 bf 10 .......~p^......
00000010: 7e 1e fe 3f 2e f1 62 44 01 2c 00 00 00 11 2a 01 ~..?..bD.,....*.
00000020: 14 00 00 00 00 00 00 20 .......

Burst event (page 2) — contains MAC addresses:

rtw_8821ce 0000:13:00.0: unused phy status page (2)
00000000: 88 55 51 95 d1 66 ad 50 2f 25 3f 89 ae 35 ef 77 .UQ..f.P/%?..5.w
00000010: 00 1e fe 3f 89 68 62 4d 88 42 40 00 8c c8 4b 68 ...?.hbM.B@xxxxx
00000020: d1 63 6c 68 a4 1c 97 5b .clh...[

Note: bytes 0x1a-0x1f are 8c:c8:4b:68:d1:63 — my adapter's MAC.
bytes 0x20-0x25 are 6c:68:a4:1c:97:5b — the AP's BSSID (partially,
the dump is only 40 bytes so it cuts off after 0x25).

Burst event (page 15) — completely random, no recognizable structure:

rtw_8821ce 0000:13:00.0: unused phy status page (15)
00000000: c6 a1 92 1c a7 68 6b 97 12 bd ad 89 30 98 ab 94 .....hk.....0...
00000010: 00 1e fe 3f ec 3f 3e 44 1f c2 91 41 0e 9b 54 5f ...?.?>D...A..T_
00000020: 30 eb 40 18 6f d3 25 62 0.@.o.%b

Burst event (page 10) — offset 0x10 is completely different pattern:

rtw_8821ce 0000:13:00.0: unused phy status page (10)
00000000: cb 1c 2a df f1 69 d0 05 58 c0 e8 0e d0 59 87 6e ..*..i..X....Y.n
00000010: 63 7e 56 f0 95 fa b8 d3 d5 4b 3e fa b0 0c 0e be c~V......K>.....
00000020: 42 28 14 89 15 c1 fd ad B(......

Last isolated event (page 4):

rtw_8821ce 0000:13:00.0: unused phy status page (4)
00000000: 97 ee fa 4e 04 90 00 21 c0 0f 89 80 b3 33 00 00 ...N...!.....3..
00000010: 00 1e fe 3f 97 7e 64 90 5d 3e 74 fa 70 e0 39 65 ...?.~d.]>t.p.9e
00000020: 48 a4 40 d3 de a9 85 15 H.@.....

Observations:

- Bytes at offset 0x0e-0x0f are usually 00 00 or have low values
in most dumps, but some are completely random.
- Bytes 0x11-0x13 are almost always 1e fe 3f (with byte 0x10
being 00 or 7e), suggesting this is a consistent part of the
RX descriptor that is not corrupted.
- The "page 2" dump at 17:00:23 clearly contains the adapter
and AP MAC addresses, confirming this is real RX frame data.
- Some dumps (page 10, page 5, page 15) have completely random
data with no recognizable RX descriptor structure at all.
- The 73-event burst at 16:57-17:00 happened over ~3 minutes but
did NOT result in a crash this time. Previously, similar bursts
of 50+ events within ~1 second always led to the NULL pointer
dereference in rtw_fw_c2h_cmd_handle+0x127.

I will keep monitoring and will send the crash-time dumps as soon as
the oops reproduces.

Thanks for looking into this.

Best regards,
Oleksandr Havrylov