Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
From: LB F
Date: Thu Mar 26 2026 - 19:57:34 EST
Hi Ping-Ke,
This is Oleksandr Havrylov again. Thank you for the ASPM/LPS Deep
quirk and the rate validation patches — they are both working correctly
(zero h2c timeouts, zero lps failures, zero mac80211 warnings).
However, I'm experiencing a different, separate bug that causes kernel
oops and makes the system completely unresponsive, requiring a hard
power-off. After disassembling the crash site, I believe I've found
the root cause.
== Summary ==
When firmware sends a C2H_ADAPTIVITY (0x37) command to an RTL8821CE
adapter, rtw_fw_adaptivity_result() dereferences rtwdev->chip->edcca_th
without a NULL check. The RTL8821C chip_info (rtw8821c_hw_spec) does
not define edcca_th, so the pointer is NULL, causing a kernel oops.
The crash occurs on the phy0 workqueue while holding rtwdev->mutex,
which never gets released. This causes all subsequent processes that
touch the network stack to hang in uninterruptible D-state, making
the system completely unresponsive and requiring a hard power-off.
== Root cause analysis ==
rtw_fw_adaptivity_result() in fw.c (line ~282):
static void rtw_fw_adaptivity_result(struct rtw_dev *rtwdev, u8 *payload,
u8 length)
{
const struct rtw_hw_reg_offset *edcca_th = rtwdev->chip->edcca_th;
...
rtw_dbg(rtwdev, RTW_DBG_ADAPTIVITY, "Reg Setting: L2H %x H2L %x\n",
rtw_read32_mask(rtwdev, edcca_th[EDCCA_TH_L2H_IDX].hw_reg.addr,
^^^^^^^^^ NULL dereference here
edcca_th[EDCCA_TH_L2H_IDX].hw_reg.mask),
...
The RTL8822C defines .edcca_th = rtw8822c_edcca_th in its chip_info,
but RTL8821C does not set this field at all — it remains NULL.
I verified this by disassembling the compiled rtw_core.ko module:
Crash RIP: rtw_fw_c2h_cmd_handle+0x127
Address: 0x1d527 = movl (%r12), %esi
R12 is loaded at +0xe5 (0x1d4e5):
movq 0x140(%rax), %r12 ; rax = rtwdev->chip
; 0x140 = offset of edcca_th in rtw_chip_info
; R12 = chip->edcca_th = NULL for 8821c
The function is entered via:
+0xd8 (0x1d4d8): cmpl $0x37, %ecx ; c2h->id == C2H_ADAPTIVITY (0x37)
With R12 = 0, the instruction at +0x127:
movl (%r12), %esi ; reads from address 0x0 → NULL pointer dereference
I also confirmed that rtw8821c_hw_spec in the mainline kernel
(torvalds/linux master, rtw8821c.c) does NOT set .edcca_th.
== Reproduction ==
The crash is highly reproducible: it occurred in 4 out of 7 recent
boots. It happens during normal active usage with no specific trigger.
boot date/time of crash uptime at crash
-5 2026-03-25 00:58:06 ~2 min
-4 2026-03-25 21:32:00 ~6h
-3 2026-03-26 00:28:14 ~2.5h
-1 2026-03-27 00:56:58 ~23.5h
Both ASPM and LPS Deep are disabled via the DMI quirk. The crash
occurs every time with the same pattern and same RIP offset (+0x127).
== Crash pattern ==
Every crash follows the same sequence:
1) Burst of 50-60 "unused phy status page" messages in ~1 second:
rtw_8821ce 0000:13:00.0: unused phy status page (8)
rtw_8821ce 0000:13:00.0: unused phy status page (2)
... (50+ more within same second)
2) Immediately followed by the kernel oops:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Oops: 0000 [#1] SMP PTI
Workqueue: phy0 rtw_c2h_work [rtw_core]
RIP: 0010:rtw_fw_c2h_cmd_handle+0x127/0x380 [rtw_core]
CR2: 0000000000000000
R12: 0000000000000000 ← edcca_th = NULL
Call Trace:
<TASK>
rtw_c2h_work+0x49/0x70 [rtw_core]
process_scheduled_works+0x1f3/0x5e0
worker_thread+0x18d/0x340
</TASK>
note: kworker/u16:6[262] exited with irqs disabled
3) After the oops, processes hang in D-state indefinitely:
warp-svc, avahi-daemon, kdeconnectd — all survive SIGKILL,
making clean shutdown impossible.
== Suggested fix ==
Add a NULL check for edcca_th in rtw_fw_adaptivity_result():
static void rtw_fw_adaptivity_result(struct rtw_dev *rtwdev, u8 *payload,
u8 length)
{
const struct rtw_hw_reg_offset *edcca_th = rtwdev->chip->edcca_th;
struct rtw_c2h_adaptivity *result = (struct rtw_c2h_adaptivity *)payload;
+ if (!edcca_th)
+ return;
+
rtw_dbg(rtwdev, RTW_DBG_ADAPTIVITY,
...
Alternatively, the C2H_ADAPTIVITY case in rtw_fw_c2h_cmd_handle()
could check edcca_th before calling the handler. However, since
RTL8821CE firmware does occasionally send C2H_ADAPTIVITY commands
(as observed in this bug), you may have a better understanding of
whether a more complete fix is needed (e.g., defining edcca_th for
RTL8821C, or preventing the firmware from sending C2H_ADAPTIVITY
responses when the feature is not fully configured).
== Hardware ==
Machine: HP HP Notebook (board: 81F0, SKU: P3S95EA#ACB)
Adapter: RTL8821CE (PCI 10ec:c821, bus 0000:13:00.0)
CPU: Intel 5005U
Kernel: 6.19.9-2-cachyos (PREEMPT full, SMP PTI, LLVM build)
Driver: lwfinger/rtw88 out-of-tree with DMI quirk + rate v2
== Confirmation that this is NOT a local modification ==
I verified that:
- fw.c: rtw_fw_c2h_cmd_handle() and rtw_fw_adaptivity_result() are
byte-for-byte identical to torvalds/linux master and lwfinger/rtw88
- rtw8821c.c: rtw8821c_hw_spec does not define .edcca_th in any of
the three sources (mainline, lwfinger, our local copy)
- The only local modifications are your DMI quirk (pci.c + main.h)
and rate validation v2 (rx.c). Neither touches fw.c.
Please let me know if you need additional diagnostics or if you'd
like me to test a patch.
Best regards,
Oleksandr Havrylov <goainwo@xxxxxxxxx>
Tested on:
Kernel: 6.19.9-2-cachyos
Driver: lwfinger/rtw88 out-of-tree with DMI quirk + rate v2 patch
Distro: CachyOS (Arch-based)
ср, 25 мар. 2026 г. в 22:38, LB F <goainwo@xxxxxxxxx>:
>
> Subject: Cross-platform analysis: RTL8821CE ASPM/LPS instability
> affects multiple OEM platforms beyond HP P3S95EA#ACB
>
> Hi Ping-Ke,
>
> First of all, thank you very much for your work on the rtw88 driver
> and for the time you spent helping us resolve the issues on our HP
> laptop. Both patches -- the v2 DMI quirk (ASPM + LPS Deep) and the
> v2 RX rate validation (rx.c) -- have been tested and verified stable
> on our system across two kernel updates (6.19.9-1 and 6.19.9-2),
> sustained network load (~1.9 GB), and multiple suspend/resume cycles.
> The system is now completely free of freezes, h2c errors, and
> mac80211 warnings. Your patches genuinely solved every issue we had.
>
> While working through this, I noticed that many other users across
> different hardware platforms appear to be experiencing the same
> problems that your patches resolved for us. I decided to collect
> and organize these observations in case they might be useful to you.
>
> Please note that this is an amateur analysis, not a professional
> one -- I am just a user trying to help. It is entirely possible
> that I have missed nuances or made incorrect assumptions. My only
> goal is to share what I found, in case it provides useful data
> points or sparks ideas for broader improvements. If any of this
> is not relevant or not useful, please feel free to disregard it.
>
>
> 1. KERNEL BUGZILLA: OPEN RTL8821CE REPORTS
> ==========================================
>
> I reviewed all open RTL8821CE bugs in kernel.org Bugzilla. Three
> of the six show symptoms that directly match the root causes
> addressed by your patches (ASPM deadlock and LPS Deep h2c failures).
>
> --- Directly correlated with ASPM/LPS patches ---
>
> Bug 215131 - System freeze (ASPM L1 deadlock)
> Title: "Realtek 8821CE makes the system freeze after 9e2fd29864c5
> (rtw88: add napi support)"
> Reporter: Kai-Heng Feng (Canonical)
> Kernel: 5.15+
> Symptoms: Hard freeze preceded by "pci bus timeout, check dma status"
> warnings. RX tag mismatch in rtw_pci_dma_check().
> Workaround confirmed by reporter: rtw88_pci.disable_aspm=1
> Reporter note: "disable_aspm=1 is not a viable workaround because
> it increases power consumption significantly"
> Status: OPEN since 2021-11-24.
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215131
> Relevance: Identical root cause to Bug 221195. The reporter's
> confirmed workaround (disable_aspm=1) is exactly what
> the DMI quirk implements.
>
> Bug 219830 - h2c/LPS failures + BT crackling
> Title: "rtw88_8821ce (RTL8821CE) slow performance and error
> messages in dmesg"
> Reporter: Bmax Y14 laptop, Fedora 41, kernel 6.13.5
> Symptoms: - "failed to send h2c command" (periodic)
> - "firmware failed to leave lps state" (periodic)
> - Lower signal strength vs Windows
> - Bluetooth crackling during audio playback
> Cross-ref: https://github.com/lwfinger/rtw88/issues/306
> Status: OPEN since 2025-03-02.
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219830
> Relevance: The h2c/lps errors are the same messages we observed
> before the DMI quirk disabled LPS Deep Mode. The BT
> crackling may correlate with the invalid RX rate
> condition addressed by your rx.c validation patch.
>
> Bug 218697 - TX queue flush timeout during scan
> Title: "rtw88_8821ce timed out to flush queue 2"
> Reporter: Arch Linux, kernel 6.8.4 / 6.8.5
> Symptoms: - "timed out to flush queue 2" every ~30 seconds
> - "failed to get tx report from firmware"
> - Stack trace: ieee80211_scan_work -> rtw_ops_flush ->
> rtw_mac_flush_queues timeout
> Status: OPEN since 2024-04-08.
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218697
> Relevance: The flush timeout occurs when the firmware cannot
> respond to TX queue operations -- consistent with
> firmware being stuck in LPS Deep during scan.
>
> --- Potentially related (no confirmed workaround data) ---
>
> Bug 217491 - "timed out to flush queue 1" regression (kernel 6.3)
> Manjaro user. Floods of "timed out to flush queue 1/2".
> Similar pattern to Bug 218697.
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217491
>
> Bug 217781 - Random sudden dropouts
> Arch user. Random disconnections during streaming/transfers.
> Reproduced on Ubuntu and Fedora (kernels 5.15 to 6.4).
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217781
>
> Bug 216685 - Low wireless speed
> Reduced throughput vs expected 802.11ac performance.
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216685
>
>
> 2. SYMPTOM-TO-PATCH MAPPING
> =============================
>
> dmesg signature Patch that resolves it
> -------------------------- ----------------------
> Hard system freeze pci.c DMI quirk (disable ASPM)
> "pci bus timeout, check dma" pci.c DMI quirk (disable ASPM)
> "firmware failed to leave lps" pci.c DMI quirk (disable LPS Deep)
> "failed to send h2c command" pci.c DMI quirk (disable LPS Deep)
> "timed out to flush queue N" pci.c DMI quirk (disable LPS Deep) [1]
> "failed to get tx report" pci.c DMI quirk (disable LPS Deep) [1]
> VHT NSS=0 mac80211 WARNING rx.c rate validation (v2)
>
> Confirmed in bugs: 215131, 219830, 218697, 221195.
> [1] Inferred: flush timeout occurs when firmware cannot exit LPS
> to process TX queue operations.
>
>
> 3. AFFECTED HARDWARE
> =====================
>
> Current DMI quirk coverage: HP P3S95EA#ACB only.
>
> Platforms confirmed affected in Bugzilla:
> Bug 221195: HP Notebook 81F0 (P3S95EA#ACB)
> Bug 215131: unknown (Canonical upstream testing)
> Bug 219830: Bmax Y14
> Bug 218697: unknown (Arch Linux user)
>
> Platforms reported in community forums as requiring
> disable_aspm=Y and/or disable_lps_deep=Y for stable RTL8821CE:
> - HP 17-by4063CL
> - Lenovo IdeaPad S145-15AST, IdeaPad 3, IdeaPad 330S
> - ASUS VivoBook X series
> - Acer Aspire 3/5 series
>
> All use PCI Device ID 10ec:c821 with different Subsystem IDs.
>
>
> 4. LPS_DEEP_MODE_LCLK IN THE rtw88 TREE
> =========================================
>
> I verified in the source which chips have the same
> lps_deep_mode_supported flag:
>
> Chip file lps_deep_mode_supported PCIe variant
> --------- ----------------------- ------------
> rtw8821c.c BIT(LPS_DEEP_MODE_LCLK) rtw8821ce ✓
> rtw8822c.c BIT(LPS_DEEP_MODE_LCLK) | PG rtw8822ce ✓
> rtw8822b.c BIT(LPS_DEEP_MODE_LCLK) rtw8822be ✓
> rtw8814a.c BIT(LPS_DEEP_MODE_LCLK) rtw8814ae ✓
> rtw8723d.c 0 rtw8723de ✗
> rtw8703b.c 0 (SDIO) -
> rtw8821a.c 0 (legacy) -
>
> Source references:
> rtw8821c.c:2002 rtw8822c.c:5365 rtw8822b.c:2545
> rtw8814a.c:2211 rtw8723d.c:2144
>
> RTL8822CE community reports (Manjaro, Arch forums) confirm the
> same disable_aspm=Y + disable_lps_deep=Y workaround is effective,
> consistent with rtw8822c.c having LCLK enabled.
>
>
> 5. COMMUNITY WORKAROUND REFERENCES
> ====================================
>
> The following are concrete examples of forums and wikis where the
> same modprobe workarounds are actively recommended:
>
> Arch Wiki - RTW88 section:
> https://wiki.archlinux.org/title/Network_configuration/Wireless
> (section "RTW88" and "rtl8821ce" under Troubleshooting/Realtek)
> Recommends rtw88-dkms-git and documents the rtw88_8821ce issues.
>
> Arch Wiki - RTW89 section (same page):
> Documents the identical ASPM pattern for the newer RTW89 driver:
> options rtw89_pci disable_aspm_l1=y disable_aspm_l1ss=y
> options rtw89_core disable_ps_mode=y
> This suggests the ASPM/LPS interaction is a systemic Realtek
> design issue, not specific to rtw88 or the 8821CE chip.
>
> Arch Linux Forum - RTL8821CE thread:
> https://bbs.archlinux.org/viewtopic.php?id=273440
> Referenced by the Arch Wiki as the primary rtl8821ce discussion.
>
> lwfinger/rtw88 GitHub:
> https://github.com/lwfinger/rtw88/issues/306
> Directly cross-referenced by Bug 219830. Reporter reports h2c
> errors and investigated antenna hardware (no fault found).
>
> lwfinger/rtw89 GitHub:
> https://github.com/lwfinger/rtw89/issues/275#issuecomment-1784155449
> Same ASPM-disable pattern documented for the newer RTW89 driver
> on HP and Lenovo laptops.
>
>
> 6. OBSERVATIONS
> ================
>
> a) Three open Bugzilla reporters (215131, 219830, 218697) show
> symptoms identical to those resolved by your patches but have
> no upstream fix available since they are not the HP P3S95EA#ACB.
>
> b) Bug 215131 reporter (Kai-Heng Feng, Canonical) explicitly
> confirmed disable_aspm=1 as a workaround and called it
> "not viable" due to power cost. A DMI quirk for their
> platform would give them a proper fix.
>
> c) The Arch Wiki documents the same ASPM-disable pattern for
> both RTW88 and RTW89 drivers, suggesting the underlying
> hardware/firmware limitation is shared across generations.
>
> d) Asking Bugzilla reporters to provide their DMI data
> (dmidecode -t 1,2) could allow extending the quirk table
> with minimal effort and zero risk to unaffected platforms.
>
> e) The rx.c rate validation patch is chip-agnostic and requires
> no platform-specific consideration.
>
>
> 7. REFERENCES
> ==============
>
> Kernel Bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=215131
> https://bugzilla.kernel.org/show_bug.cgi?id=219830
> https://bugzilla.kernel.org/show_bug.cgi?id=218697
> https://bugzilla.kernel.org/show_bug.cgi?id=217491
> https://bugzilla.kernel.org/show_bug.cgi?id=217781
> https://bugzilla.kernel.org/show_bug.cgi?id=216685
>
> GitHub:
> https://github.com/lwfinger/rtw88/issues/306
> https://github.com/lwfinger/rtw89/issues/275
>
> Arch Wiki:
> https://wiki.archlinux.org/title/Network_configuration/Wireless
>
> Arch Linux Forum:
> https://bbs.archlinux.org/viewtopic.php?id=273440
>
> Source code (lps_deep_mode_supported):
> drivers/net/wireless/realtek/rtw88/rtw8821c.c:2002
> drivers/net/wireless/realtek/rtw88/rtw8822c.c:5365
> drivers/net/wireless/realtek/rtw88/rtw8822b.c:2545
> drivers/net/wireless/realtek/rtw88/rtw8814a.c:2211
> drivers/net/wireless/realtek/rtw88/rtw8723d.c:2144
>
>
> Best regards,
> Oleksandr Havrylov <goainwo@xxxxxxxxx>