Re: [PATCH 07/55] staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps

From: Felix Fietkau
Date: Tue Dec 17 2019 - 06:20:52 EST


Hello JÃrÃme,

On 2019-12-17 12:01, JÃrÃme Pouiller wrote:
> On Monday 16 December 2019 19:08:39 CET Felix Fietkau wrote:
>> On 2019-12-16 18:03, JÃrÃme Pouiller wrote:
>> > From: JÃrÃme Pouiller <jerome.pouiller@xxxxxxxxxx>
>> >
>> > When not using HT mode, minstrel always includes 1Mbps as fallback rate.
>> > But, when using HT mode, this fallback is not included. Yet, it seems
>> > that it could save some frames. So, this patch add it unconditionally.
>> >
>> > Signed-off-by: JÃrÃme Pouiller <jerome.pouiller@xxxxxxxxxx>
>> Are you sure that's a good idea? Sometimes a little packet loss can be
>> preferable over a larger amount of airtime wasted through using really
>> low rates. Especially when you consider bufferbloat.
> I have observed that, in some circumstances, TCP throughput was far
> better with 802.11g than with 802.11n. I found that 802.11n had more Tx
> failures. These failures have big impacts on the congestion window. When
> the congestion window is low, it impacts the capacity of aggregation of
> the link. Thus, it does not help to improve the congestion windows.
>
> By investigating deeper, it appears that the minstrel (used by 802.11g)
> always add rate 1Mbps to the rate list while minstrel_ht (used by
> 802.11n) don't (compare minstrel_update_rates() and
> minstrel_ht_update_rates()). This difference seems to be correlated to
> the difference of TCP throughput I can observe.
>
> I did some search in git history and I did not find any explanation for
> this difference between minstrel and minstrel_ht (however, it seems you
> are the right person to ask :) ). I didn't find why it would be
> efficient on minstrel and inefficient on minstrel_ht. And since this
> change fix the issue that I observed, I have tried to apply it and wait
> for feedback.
I have found that in many cases when minstrel_ht selects sub-optimal
rates that cause too many re-transmissions or re-transmission failures,
it was because there was an issue in tx status reporting.
Another possible reason is buffering too many packets without having the
ability to alter the rates for in-flight packets based on bad tx status
results.
To find out what the driver/hardware is doing, I took a quick look and
it seems to be managing multiple tx rate policies based on per-packet
rate info. Based on that I have an idea of what you could try to make
things better:
Instead of using per-packet rate info, implement the
.sta_rate_tbl_update callback to maintain a primary tx policy used for
all non-probing non-fixed-rate packets, which you can alter while
packets using it are queued already.
The existing approach using per-packet tx_info data should then be used
only for probing or fixed-rate packets.
You then probably have to be a bit clever in the tx status path for
figuring out what rates were actually used.

- Felix