RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
From: Ping-Ke Shih
Date: Tue Mar 10 2026 - 22:24:08 EST
Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
>
> LB F <goainwo@xxxxxxxxx> wrote:
> >
> > Hi Ping-Ke,
> >
> > Thank you for the incredibly fast response and assistance!
> >
> > > Can you dig kernel log (by netconsole or ramoops) if something useful?
> > > I'd like to know this is hardware level freeze or kernel can capture something
> > wrong.
> >
> > I managed to pull a call trace from a historic journald log just
> > before the system hung. The kernel gets trapped in an IRQ thread
> > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211`
> > `ieee80211_rx_list` before everything freezes. Here is the relevant
> > snippet:
> >
> > ```text
> > Call Trace:
> > <IRQ>
> > ? __alloc_skb+0x23a/0x2a0
> > ? __alloc_skb+0x10c/0x2a0
> > ? __pfx_irq_thread_fn+0x10/0x10
> > [ ... truncated module list ... ]
> > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full)
> > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020
> > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211]
> > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc
> > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci]
> > ```
> >
> > It behaves exactly like a PCIe bus deadlock or a hardware fault that
> > eventually brings down the CPU handling the IRQ.
>
> I wonder if there is a malformed data, causing this trace and the leads
> kernel freezes. If we can do validation on RX data before calling
> ieee80211_rx_list(), maybe trace disappears and everything will be fine?
> Even no need workaround.
>
> >
> > > Are these totally needed to workaround the problem? Or disable_aspm is enough?
> > > I'd list them in order of power consumption impact:
> > > 1. disable_aspm=y
> > > 2. disable_lps_deep=y
> > > 3. disable WiFi power save
> >
> > To verify which parameters are strictly necessary, I performed
> > isolated testing today. I ensured no other modprobe configs were
> > active, rebuilt the initramfs, and manually enforced that
> > `wifi.powersave` was active via `iw dev wlan0 set power_save on`
> > during all tests (as the OS power management profiles were defaulting
> > it to off, which initially masked the issue).
> >
> > I tested each workaround individually across multiple sleep/wake
> > cycles and active usage:
> >
> > **Test 1 (ASPM Disabled, LPS Deep Enabled):**
> > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core
> > disable_lps_deep=n`)
> > - Result: Stable. No freezes were observed during usage or transitions
> > into/out of S3 sleep while power saving was enforced.
> >
> > **Test 2 (ASPM Enabled, LPS Deep Disabled):**
> > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci
> > disable_aspm=n`)
> > - Result: Stable. No freezes were observed under the same forced power
> > save conditions.
> >
> > **Conclusion:** It appears we do not need both workarounds
> > simultaneously for this specific hardware. Using only `disable_aspm=y`
> > seems to be sufficient to prevent the system freeze. Given your note
> > about the power consumption impact ranking, this looks like the
> > optimal path forward.
>
> Let's test my RFT patch to disable ASPM then.
>
> >
> > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR,
> > > and going to receive packets. The rx_no_aspm workaround is to forcely turn
> > > off ASPM during this period.
> >
> > By "deadlock" I meant a hardware-level bus lockup. It seems the
> > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus
> > when trying to negotiate waking up from ASPM L1 while simultaneously
> > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI
> > helps during active Rx decoding, but the laptop often freezes while
> > completely idle, presumably when the AP sends a basic beacon, the chip
> > attempts to leave LPS Deep + L1, and the hardware simply gives up and
> > halts the system.
>
> I think this is your perspective and induction, right? Did you measure
> real hardware signals?
>
> My point is that if this is a hardware-level bus lockup, let's apply
> quirk. If some malformed data causing kernel hangs, I'd add sanity check
> on RX data, but I don't actually know what we should check for now.
>
> >
> > > We have not modified RTL8821CE for a long time, so I'd add workaround
> > > to specific platform as mentioned above.
> >
> > Adding a DMI/platform quirk specifically for this laptop to disable
> > ASPM would be wonderful and deeply appreciated. I agree it is safer
> > than touching the global flags for hardware that is functioning
> > correctly out in the wild.
> >
> > Here is the exact identifying information for my system:
> >
> > System Vendor: HP
> > Product Name: HP Notebook
> > SKU Number: P3S95EA#ACB
> > Family: 103C_5335KV
> > PCI ID: 10ec:c821
> > Subsystem ID: 103c:831a
> >
> > I am completely ready to test any patch or quirk you send my way.
> > Thank you so much for your time and helping track this down!
>
> I sent a RFT [1] for test. Please check if it works on your HP notebook.
> If you check rtw88 log, you can see I added similar patch 5 years ago,
> and replaced by preferred the change of "rtwpci->rx_no_aspm", which I
> think it can only resolve problem on partial notebooks though....
>
> [1]
> https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek.
> com/T/#u
Forgot to say. Could you share your full name for me as a reporter
in commit message?