Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)

From: LB F

Date: Wed Mar 11 2026 - 11:27:48 EST


Hi Ping-Ke,

I successfully applied your patch out-of-tree and performed rigorous
testing on the host machine.

I can officially confirm that the patch works flawlessly. The DMI
quirk triggered correctly and successfully prevented the
hardware-level PCIe bus lockups on my HP P3S95EA#ACB.

Testing Environment & Methodology:
- Kernel: CachyOS Linux 6.19.6-2-cachyos x86_64
- Toolchain: Clang/LLVM 21.1.8 (`make CC=clang LLVM=1 modules`)
- Extraction: We fetched the strict
`drivers/net/wireless/realtek/rtw88` sub-tree out of the
torvalds/linux `v6.19` tree utilizing `git sparse-checkout` to cleanly
apply the patch without having to compile the entire 2.5GB+ kernel.
- The resulting `.ko` object files were compressed to `.zst` and
installed successfully over the generic CachyOS system driver objects.

Verification Conditions:
- Removed ALL local workarounds. `disable_aspm=Y` is no longer forced
via `/etc/modprobe.d/` overrides.
- Power saving remains natively ON `wifi.powersave = 3` (managed by
NetworkManager).
- Left the laptop in multiple 5-10 minute complete idle states to
enforce sleep modes.

Post-Boot Log Analysis & Potential Improvement Proposition:
The system remained 100% stable without any kernel panics or UI freezes.
However, I continuously monitored the `dmesg` ring buffer and noticed
an intriguing behavior. While the laptop sits completely idle
(NetworkManager connected, but no active traffic), the `rtw88` driver
starts flooded the logs with thousands of firmware errors:

[ 1084.746485] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state
[ 1084.749662] rtw88_8821ce 0000:13:00.0: failed to send h2c command
[ 1084.752895] rtw88_8821ce 0000:13:00.0: failed to send h2c command

If my understanding of this architecture is correct, previously, when
ASPM wasn't disabled, this exact failure of the adapter firmare inside
`LPS_DEEP_MODE_LCLK` would violently lock up the PCIe bus and crash
the host. Now, thanks to your DMI ASPM quirk at the `rtw88_pci` level,
the host PCIe controller doesn't enter `L1` and is perfectly shielded
from the adapter locking itself up! The OS handles the timeouts
gracefully and driver recovery prevents a hard freeze.

A question for your consideration: Given the immense volume of these
`h2c` timeout errors (and the underlying firmware's fundamental
inability to cleanly enter/exit its own sleep states without L1
participation on this HP model), do you think it would be beneficial
to *also* dynamically disable LPS Deep sleep when this specific ASPM
quirk is triggered?

For example, dynamically forcing `rtwdev->lps_conf.deep_mode =
LPS_DEEP_MODE_NONE` when the DMI ASPM flag is active, strictly to
prevent the firmware from attempting a sleep cycle that is doomed to
fail and polluting the queues and logs? Perhaps this might also save
microscopic CPU interrupts from continuous H2C polling timeouts?

If you believe that simply letting the driver recover and tolerating
the error spam in `dmesg` is the preferred/safer upstream approach, I
am perfectly happy. The patch functions as advertised and system
stability is unequivocally restored!

Thank you immensely for your rapid debugging and definitive patch for
this long-standing issue and for bringing stability to this model.

Tested-by: Oleksandr Havrylov <goainwo@xxxxxxxxx>

*(Note: I was a bit unsure which of the two active mailing list
threads was the most appropriate place for this final report — the
original bug discussion or the new RFT patch submission thread — so I
replied to both just to ensure it is correctly attached to the patch.
Apologies for the duplicate email!)*

Best regards,
Oleksandr Havrylov

ср, 11 мар. 2026 г. в 13:00, LB F <goainwo@xxxxxxxxx>:
>
> Hi Ping-Ke,
>
> Thank you for the incredibly fast turnaround and for providing the RFT
> patch with the DMI quirk!
>
> First, I want to mention that I am not an IT professional or a
> programmer. I am just a regular Linux user who really wants to help
> solve this problem. I am trying my best to verify everything
> carefully, so please forgive me if my terminology or induction was
> slightly off.
>
> To answer your clarifying questions from the previous emails:
>
> > Just want to clarify that these logs only appear in test 3, right?
> > No these logs in test 1/2.
>
> Yes, exactly. The `failed to send h2c command` errors only caused a
> complete system freeze when no workarounds were active and the adapter
> attempted to sleep (Test 3).
>
> > I think this is your perspective and induction, right? Did you measure
> > real hardware signals?
>
> You are entirely correct. This is just my induction based solely on
> the timing of the logs and system behavior. I do not have access to an
> oscilloscope or any hardware diagnostic tools. Given this, I
> completely agree that your approach of applying a platform-specific
> quirk is the safest and best solution.
>
> > Forgot to say. Could you share your full name for me as a reporter
> > in commit message?
>
> My full name is Oleksandr Havrylov. I would be honored to be included
> as the reporter in the commit message.
>
> ### Recent Baseline Testing Before Your Patch
>
> Before applying your patch today, we ran a few more controlled tests
> to double-check our baseline. We verified that our local workaround
> (`modprobe.d disable_aspm=y`) **does indeed keep the system completely
> stable** and prevents the hard freeze, even when NetworkManager's
> `wifi.powersave` is set to ON (default).
>
> However, we noticed one interesting detail in the kernel logs: while
> the system no longer freezes with `disable_aspm=y`, `dmesg` still
> constantly logs `firmware failed to leave lps state` and `failed to
> send h2c command` when the laptop is completely idle. It seems the
> firmware still crashes during LPS, but because ASPM is disabled, the
> PCIe bus ignores the crash and the system survives perfectly fine. I
> just wanted to mention this for completeness!
>
> ### Testing Plan
>
> I have **not** applied your RFT patch just yet. I wanted to make sure
> our testing baseline was 100% clean and documented first.
>
> I will compile your patch and perform rigorous testing this evening (I
> am in the EET timezone, Ukraine). I will test it with the native
> `power_save` fully enabled to ensure your patch successfully prevents
> the hard lockups as intended.
>
> I will stay in touch and reply back to this thread with a formal
> `Tested-by` confirmation (and any logs if needed) as soon as my
> testing is complete. Thank you again for all your help!
>
> Best regards,
> Oleksandr Havrylov
>
> ср, 11 мар. 2026 г. в 04:22, Ping-Ke Shih <pkshih@xxxxxxxxxxx>:
> >
> > Ping-Ke Shih <pkshih@xxxxxxxxxxx> wrote:
> > >
> > > LB F <goainwo@xxxxxxxxx> wrote:
> > > >
> > > > Hi Ping-Ke,
> > > >
> > > > Thank you for the incredibly fast response and assistance!
> > > >
> > > > > Can you dig kernel log (by netconsole or ramoops) if something useful?
> > > > > I'd like to know this is hardware level freeze or kernel can capture something
> > > > wrong.
> > > >
> > > > I managed to pull a call trace from a historic journald log just
> > > > before the system hung. The kernel gets trapped in an IRQ thread
> > > > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211`
> > > > `ieee80211_rx_list` before everything freezes. Here is the relevant
> > > > snippet:
> > > >
> > > > ```text
> > > > Call Trace:
> > > > <IRQ>
> > > > ? __alloc_skb+0x23a/0x2a0
> > > > ? __alloc_skb+0x10c/0x2a0
> > > > ? __pfx_irq_thread_fn+0x10/0x10
> > > > [ ... truncated module list ... ]
> > > > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full)
> > > > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020
> > > > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211]
> > > > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc
> > > > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci]
> > > > ```
> > > >
> > > > It behaves exactly like a PCIe bus deadlock or a hardware fault that
> > > > eventually brings down the CPU handling the IRQ.
> > >
> > > I wonder if there is a malformed data, causing this trace and the leads
> > > kernel freezes. If we can do validation on RX data before calling
> > > ieee80211_rx_list(), maybe trace disappears and everything will be fine?
> > > Even no need workaround.
> > >
> > > >
> > > > > Are these totally needed to workaround the problem? Or disable_aspm is enough?
> > > > > I'd list them in order of power consumption impact:
> > > > > 1. disable_aspm=y
> > > > > 2. disable_lps_deep=y
> > > > > 3. disable WiFi power save
> > > >
> > > > To verify which parameters are strictly necessary, I performed
> > > > isolated testing today. I ensured no other modprobe configs were
> > > > active, rebuilt the initramfs, and manually enforced that
> > > > `wifi.powersave` was active via `iw dev wlan0 set power_save on`
> > > > during all tests (as the OS power management profiles were defaulting
> > > > it to off, which initially masked the issue).
> > > >
> > > > I tested each workaround individually across multiple sleep/wake
> > > > cycles and active usage:
> > > >
> > > > **Test 1 (ASPM Disabled, LPS Deep Enabled):**
> > > > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core
> > > > disable_lps_deep=n`)
> > > > - Result: Stable. No freezes were observed during usage or transitions
> > > > into/out of S3 sleep while power saving was enforced.
> > > >
> > > > **Test 2 (ASPM Enabled, LPS Deep Disabled):**
> > > > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci
> > > > disable_aspm=n`)
> > > > - Result: Stable. No freezes were observed under the same forced power
> > > > save conditions.
> > > >
> > > > **Conclusion:** It appears we do not need both workarounds
> > > > simultaneously for this specific hardware. Using only `disable_aspm=y`
> > > > seems to be sufficient to prevent the system freeze. Given your note
> > > > about the power consumption impact ranking, this looks like the
> > > > optimal path forward.
> > >
> > > Let's test my RFT patch to disable ASPM then.
> > >
> > > >
> > > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR,
> > > > > and going to receive packets. The rx_no_aspm workaround is to forcely turn
> > > > > off ASPM during this period.
> > > >
> > > > By "deadlock" I meant a hardware-level bus lockup. It seems the
> > > > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus
> > > > when trying to negotiate waking up from ASPM L1 while simultaneously
> > > > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI
> > > > helps during active Rx decoding, but the laptop often freezes while
> > > > completely idle, presumably when the AP sends a basic beacon, the chip
> > > > attempts to leave LPS Deep + L1, and the hardware simply gives up and
> > > > halts the system.
> > >
> > > I think this is your perspective and induction, right? Did you measure
> > > real hardware signals?
> > >
> > > My point is that if this is a hardware-level bus lockup, let's apply
> > > quirk. If some malformed data causing kernel hangs, I'd add sanity check
> > > on RX data, but I don't actually know what we should check for now.
> > >
> > > >
> > > > > We have not modified RTL8821CE for a long time, so I'd add workaround
> > > > > to specific platform as mentioned above.
> > > >
> > > > Adding a DMI/platform quirk specifically for this laptop to disable
> > > > ASPM would be wonderful and deeply appreciated. I agree it is safer
> > > > than touching the global flags for hardware that is functioning
> > > > correctly out in the wild.
> > > >
> > > > Here is the exact identifying information for my system:
> > > >
> > > > System Vendor: HP
> > > > Product Name: HP Notebook
> > > > SKU Number: P3S95EA#ACB
> > > > Family: 103C_5335KV
> > > > PCI ID: 10ec:c821
> > > > Subsystem ID: 103c:831a
> > > >
> > > > I am completely ready to test any patch or quirk you send my way.
> > > > Thank you so much for your time and helping track this down!
> > >
> > > I sent a RFT [1] for test. Please check if it works on your HP notebook.
> > > If you check rtw88 log, you can see I added similar patch 5 years ago,
> > > and replaced by preferred the change of "rtwpci->rx_no_aspm", which I
> > > think it can only resolve problem on partial notebooks though....
> > >
> > > [1]
> > > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek.
> > > com/T/#u
> >
> > Forgot to say. Could you share your full name for me as a reporter
> > in commit message?
> >
> >