Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition

From: Luka Gejak

Date: Fri May 01 2026 - 17:33:53 EST


On May 1, 2026 11:28:38 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@xxxxxxxxx> wrote:
>On 01/05/2026 23:46, Luka Gejak wrote:
>> On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@xxxxxxxxx> wrote:
>>> On 01/05/2026 18:04, luka.gejak@xxxxxxxxx wrote:
>>>> From: Luka Gejak <luka.gejak@xxxxxxxxx>
>>>>
>>>> The driver expects the firmware to report TX status within 500ms.
>>>> However, a race condition exists when the hardware is under heavy TX
>>>> load and is simultaneously interrupted by background scans or
>>>> power-saving state transitions. During these events, the firmware may
>>>> go off-channel for longer than 500ms, delaying the TX reports.
>>>>
>> Hi Bitterblue,
>> thanks for the review.
>>>
>>> But power saving state transitions should not happen during heavy TX load.
>>>
>> You are absolutely right that power save transitions don't happen
>> during heavy TX. The issue is strictly tied to off-channel dwell time.
>> I reliably trigger this on my rtl8723du (USB) by forcing background
>> scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware
>> goes off-channel to scan, which delays the TX report well beyond the
>> current 500ms threshold.
>>
>>>> When this happens, the purge timer fires prematurely, dropping the
>>>> tracking skbs from the queue and spamming the kernel log with:
>>>> "failed to get tx report from firmware". Dropping these tracking skbs
>>>> prevents the driver from reporting TX status back to mac80211, which
>>>> breaks rate control accounting and degrades performance.
>>>>
>>>
>>> But mac80211 doesn't handle rate control for these chips. How much does
>>> performance degrade?
>>>
>>
>> I understand the firmware handles that internally. The performance
>> degradation I am actually seeing is TCP window collapse, as the host
>> stack interprets the dropped tracking skbs as packet loss. In my
>> testing with iperf3, throughput drops from a steady 80-90 Mbps to
>> near-zero for nearly 2 seconds following the scan before recovery
>> begins.
>>
>>>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough
>>>> to comfortably accommodate the duration of full WiFi background scans
>>>> and sleep transitions without incorrectly tripping the purge timer,
>>>> while still eventually catching true firmware lockups.
>>>>
>>>
>>> rtw88 supports many chips. Which one are you using?
>>>
>>> Perhaps provide a full description of the problem you encountered.
>>>
>>
>> ...
>>
>> I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to
>> 2500ms is too heavy-handed. Since this impacts all rtw88 chips,
>> including PCIe variants where 500ms might be exactly what is needed to
>> catch a real firmware lockup, the blast radius is too large. How would
>> you prefer I handle this for the v2 patch? I can either implement a
>> more conservative global bump, or make the timeout dynamic based on
>> the HCI interface so USB devices get a longer timeout to accommodate
>> the bus latency during scans.
>>
>> Best regards,
>> Luka Gejak
>
>No idea, I'm just asking some questions...
>
>Actually, I have one more: what version of the driver did you test?
>
>My quick test with RTL8723DU doesn't show any "failed to get tx report
>from firmware" when scanning while running iperf3. Does it take a long
>time to trigger?

I am testing against the latest wireless-next tree.
You are correct that it is an intermittent race condition, which
explains why it doesn't appear in every test run. To reproduce this, I
use a script to sustain heavy TX load while forcing background scans
in a loop. Under this stress, it typically manifests after a few
minutes of operation.
Best regards,
Luka Gejak