RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: Ping-Ke Shih

Date: Tue Mar 10 2026 - 22:20:15 EST


LB F <goainwo@xxxxxxxxx> wrote:
>
> Hi Ping-Ke,
>
> Thank you for the incredibly fast response and assistance!
>
> > Can you dig kernel log (by netconsole or ramoops) if something useful?
> > I'd like to know this is hardware level freeze or kernel can capture something
> wrong.
>
> I managed to pull a call trace from a historic journald log just
> before the system hung. The kernel gets trapped in an IRQ thread
> inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211`
> `ieee80211_rx_list` before everything freezes. Here is the relevant
> snippet:
>
> ```text
> Call Trace:
> <IRQ>
> ? __alloc_skb+0x23a/0x2a0
> ? __alloc_skb+0x10c/0x2a0
> ? __pfx_irq_thread_fn+0x10/0x10
> [ ... truncated module list ... ]
> Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full)
> Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020
> RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211]
> CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc
> rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci]
> ```
>
> It behaves exactly like a PCIe bus deadlock or a hardware fault that
> eventually brings down the CPU handling the IRQ.

I wonder if there is a malformed data, causing this trace and the leads
kernel freezes. If we can do validation on RX data before calling
ieee80211_rx_list(), maybe trace disappears and everything will be fine?
Even no need workaround.

>
> > Are these totally needed to workaround the problem? Or disable_aspm is enough?
> > I'd list them in order of power consumption impact:
> > 1. disable_aspm=y
> > 2. disable_lps_deep=y
> > 3. disable WiFi power save
>
> To verify which parameters are strictly necessary, I performed
> isolated testing today. I ensured no other modprobe configs were
> active, rebuilt the initramfs, and manually enforced that
> `wifi.powersave` was active via `iw dev wlan0 set power_save on`
> during all tests (as the OS power management profiles were defaulting
> it to off, which initially masked the issue).
>
> I tested each workaround individually across multiple sleep/wake
> cycles and active usage:
>
> **Test 1 (ASPM Disabled, LPS Deep Enabled):**
> - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core
> disable_lps_deep=n`)
> - Result: Stable. No freezes were observed during usage or transitions
> into/out of S3 sleep while power saving was enforced.
>
> **Test 2 (ASPM Enabled, LPS Deep Disabled):**
> - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci
> disable_aspm=n`)
> - Result: Stable. No freezes were observed under the same forced power
> save conditions.
>
> **Conclusion:** It appears we do not need both workarounds
> simultaneously for this specific hardware. Using only `disable_aspm=y`
> seems to be sufficient to prevent the system freeze. Given your note
> about the power consumption impact ranking, this looks like the
> optimal path forward.

Let's test my RFT patch to disable ASPM then.

>
> > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR,
> > and going to receive packets. The rx_no_aspm workaround is to forcely turn
> > off ASPM during this period.
>
> By "deadlock" I meant a hardware-level bus lockup. It seems the
> physical RTL8821CE chip itself crashes or hangs the system's PCIe bus
> when trying to negotiate waking up from ASPM L1 while simultaneously
> existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI
> helps during active Rx decoding, but the laptop often freezes while
> completely idle, presumably when the AP sends a basic beacon, the chip
> attempts to leave LPS Deep + L1, and the hardware simply gives up and
> halts the system.

I think this is your perspective and induction, right? Did you measure
real hardware signals?

My point is that if this is a hardware-level bus lockup, let's apply
quirk. If some malformed data causing kernel hangs, I'd add sanity check
on RX data, but I don't actually know what we should check for now.

>
> > We have not modified RTL8821CE for a long time, so I'd add workaround
> > to specific platform as mentioned above.
>
> Adding a DMI/platform quirk specifically for this laptop to disable
> ASPM would be wonderful and deeply appreciated. I agree it is safer
> than touching the global flags for hardware that is functioning
> correctly out in the wild.
>
> Here is the exact identifying information for my system:
>
> System Vendor: HP
> Product Name: HP Notebook
> SKU Number: P3S95EA#ACB
> Family: 103C_5335KV
> PCI ID: 10ec:c821
> Subsystem ID: 103c:831a
>
> I am completely ready to test any patch or quirk you send my way.
> Thank you so much for your time and helping track this down!

I sent a RFT [1] for test. Please check if it works on your HP notebook.
If you check rtw88 log, you can see I added similar patch 5 years ago,
and replaced by preferred the change of "rtwpci->rx_no_aspm", which I
think it can only resolve problem on partial notebooks though....

[1] https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@xxxxxxxxxxx/T/#u

Ping-Ke