Re: RTL8723BE performance regression
From: JoÃo Paulo Rechi Vita
Date: Mon May 07 2018 - 17:49:56 EST
On Tue, May 1, 2018 at 10:58 PM, Pkshih <pkshih@xxxxxxxxxxx> wrote:
> On Wed, 2018-05-02 at 05:44 +0000, Pkshih wrote:
>>
>> > -----Original Message-----
>> > From: JoÃo Paulo Rechi Vita [mailto:jprvita@xxxxxxxxx]
>> > Sent: Wednesday, May 02, 2018 6:41 AM
>> > To: Larry Finger
>> > Cc: Steve deRosier; èåå; Pkshih; Birming Chiu; Shaofu; Steven Ting; Chaoming_Li; Kalle Valo;
>> > linux-wireless; Network Development; LKML; Daniel Drake; JoÃo Paulo Rechi Vita; linux@xxxxxxxxxx
>> om
>> > Subject: Re: RTL8723BE performance regression
>> >
>> > On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote:
>> > > On 04/03/2018 09:37 PM, JoÃo Paulo Rechi Vita wrote:
>> > >>
>> > >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger <Larry.Finger@xxxxxxxxxxxx>
>> > >> wrote:
>> > >>
>> > >> (...)
>> > >>
>> > >>> As the antenna selection code changes affected your first bisection, do
>> > >>> you
>> > >>> have one of those HP laptops with only one antenna and the incorrect
>> > >>> coding
>> > >>> in the FUSE?
>> > >>
>> > >>
>> > >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
>> > >> was needed to achieve a good performance in the past, before this
>> > >> regression. I've also opened the laptop chassis and confirmed the
>> > >> antenna cable is plugged to the connector labeled with "1" on the
>> > >> card.
>> > >>
>> > >>> If so, please make sure that you still have the same signal
>> > >>> strength for good and bad cases. I have tried to keep the driver and the
>> > >>> btcoex code in sync, but there may be some combinations of antenna
>> > >>> configuration and FUSE contents that cause the code to fail.
>> > >>>
>> > >>
>> > >> What is the recommended way to monitor the signal strength?
>> > >
>> > >
>> > > The btcoex code is developed for multiple platforms by a different group
>> > > than the Linux driver. I think they made a change that caused ant_sel to
>> > > switch from 1 to 2. At least numerous comments at
>> > > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
>> > >
>> > > Mhy recommended method is to verify the wifi device name with "iw dev". Then
>> > > using that device
>> > >
>> > > sudo iw dev <dev_name> scan | egrep "SSID|signal"
>> > >
>> >
>> > I have confirmed that the performance regression is indeed tied to
>> > signal strength: on the good cases signal was between -16 and -8 dBm,
>> > whereas in bad cases signal was always between -50 to - 40 dBm. I've
>> > also switched to testing bandwidth in controlled LAN environment using
>> > iperf3, as suggested by Steve deRosier, with the DUT being the only
>> > machine connected to the 2.4 GHz radio and the machine running the
>> > iperf3 server connected via ethernet.
>> >
>>
>> We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup
>> 8723be ant_sel definition"). You can use the above commit and do the same
>> experiments (with ant_sel=0, 1 and 2) in your side, and then share your results.
>> Since performance is tied to signal strength, you can only share signal strength.
>>
>
> Please pay attention to cold reboot once ant_sel is changed.
>
I've tested the commit mentioned above and it fixes the problem on top
of v4.16 (in addition to the latest wireless-drivers-next also been
fixed as it already contains such commit). On v4.15, we also need the
following commits before "af8a41cccf8f rtlwifi: cleanup 8723be ant_sel
definition" to have a good performance again:
874e837d67d0 rtlwifi: fill FW version and subversion
a44709bba70f rtlwifi: btcoex: Add power_on_setting routine
40d9dd4f1c5d rtlwifi: btcoex: Remove global variables from btcoex
Surprisingly, it seems forcing ant_sel=1 is not needed anymore on
these machines, as the shown by the numbers bellow (ant_sel=0 means
that actually no parameter was passed to the module). I have powered
off the machine and done a cold boot for every test. It seems
something have changed in the antenna auto-selection code since v4.11,
the latest point where I could confirm we definitely need to force
ant_sel=1. I've been trying to understand what causes this difference,
but haven't made progress on that so far, so any suggestions are
appreciated (we are trying to decide if we can confidently drop the
downstream DMI quirks for these specific machines).
w-d-n ant_sel=0: -14.00 dBm, 69.5 Mbps -> good
w-d-n ant_sel=1: -10.00 dBm, 41.1 Mbps -> good
w-d-n ant_sel=2: -44.00 dBm, 607 kbps -> bad
v4.16 ant_sel=0: -12.00 dBm, 63.0 Mbps -> good
v4.16 ant_sel=1: - 8.00 dBm, 69.0 Mbps -> good
v4.16 ant_sel=2: -50.00 dBm, 224 kbps -> bad
v4.15 ant_sel=0: - 8.00 dBm, 33.0 Mbps -> good
v4.15 ant_sel=1: -10.00 dBm, 38.1 Mbps -> good
v4.15 ant_sel=2: -48.00 dBm, 206 kbps -> bad
--
JoÃo Paulo Rechi Vita
http://about.me/jprvita