Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition
From: Luka Gejak
Date: Fri May 01 2026 - 16:47:23 EST
On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@xxxxxxxxx> wrote:
>On 01/05/2026 18:04, luka.gejak@xxxxxxxxx wrote:
>> From: Luka Gejak <luka.gejak@xxxxxxxxx>
>>
>> The driver expects the firmware to report TX status within 500ms.
>> However, a race condition exists when the hardware is under heavy TX
>> load and is simultaneously interrupted by background scans or
>> power-saving state transitions. During these events, the firmware may
>> go off-channel for longer than 500ms, delaying the TX reports.
>>
Hi Bitterblue,
thanks for the review.
>
>But power saving state transitions should not happen during heavy TX load.
>
You are absolutely right that power save transitions don't happen
during heavy TX. The issue is strictly tied to off-channel dwell time.
I reliably trigger this on my rtl8723du (USB) by forcing background
scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware
goes off-channel to scan, which delays the TX report well beyond the
current 500ms threshold.
>> When this happens, the purge timer fires prematurely, dropping the
>> tracking skbs from the queue and spamming the kernel log with:
>> "failed to get tx report from firmware". Dropping these tracking skbs
>> prevents the driver from reporting TX status back to mac80211, which
>> breaks rate control accounting and degrades performance.
>>
>
>But mac80211 doesn't handle rate control for these chips. How much does
>performance degrade?
>
I understand the firmware handles that internally. The performance
degradation I am actually seeing is TCP window collapse, as the host
stack interprets the dropped tracking skbs as packet loss. In my
testing with iperf3, throughput drops from a steady 80-90 Mbps to
near-zero for nearly 2 seconds following the scan before recovery
begins.
>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough
>> to comfortably accommodate the duration of full WiFi background scans
>> and sleep transitions without incorrectly tripping the purge timer,
>> while still eventually catching true firmware lockups.
>>
>
>rtw88 supports many chips. Which one are you using?
>
>Perhaps provide a full description of the problem you encountered.
>
...
I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to
2500ms is too heavy-handed. Since this impacts all rtw88 chips,
including PCIe variants where 500ms might be exactly what is needed to
catch a real firmware lockup, the blast radius is too large. How would
you prefer I handle this for the v2 patch? I can either implement a
more conservative global bump, or make the timeout dynamic based on
the HCI interface so USB devices get a longer timeout to accommodate
the bus latency during scans.
Best regards,
Luka Gejak