Re: [PATCH wireless] wifi: wlcore: fix wlcore AP mode
From: Russell King (Oracle)
Date: Tue May 28 2024 - 04:50:48 EST
On Tue, May 28, 2024 at 09:36:43AM +0100, Russell King wrote:
> From: Johannes Berg <johannes.berg@xxxxxxxxx>
>
> Using wl183x devices in AP mode with various firmwares is not stable.
>
> The driver currently adds a station to firmware with basic rates when it
> is first known to the stack using the CMD_ADD_PEER command. Once the
> station has finished authorising, another CMD_ADD_PEER command is issued
> to update the firmware with the rates the station can use.
>
> However, after a random amount of time, the firmware ignores the power
> management nullfunc frames from the station, and tries to send packets
> while the station is asleep, resulting in lots of retries dropping down
> in rate due to no response. This restricts the available bandwidth.
>
> With this happening with several stations, the user visible effect is
> the latency of interactive connections increases significantly, packets
> get dropped, and in general the WiFi connections become unreliable and
> unstable.
>
> Eventually, the firmware transmit queue appears to get stuck - with
> packets and blocks allocated that never clear.
>
> TI have a couple of patches that address this, but they touch the
> mac80211 core to disable NL80211_FEATURE_FULL_AP_CLIENT_STATE for *all*
> wireless drivers, which has the effect of not adding the station to the
> stack until later when the rates are known. This is a sledge hammer
> approach to solving the problem.
>
> The solution implemented here has the same effect, but without
> impacting all drivers.
>
> We delay adding the station to firmware until it has been authorised
> in the driver, and correspondingly remove the station when unwinding
> from authorised state. Adding the station to firmware allocates a hlid,
> which will now happen later than the driver expects. Therefore, we need
> to track when this happens so that we transmit using the correct hlid.
>
> This patch is an equivalent fix to these two patches in TI's
> wilink8-wlan repository:
>
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0004-mac80211-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0005-wlcore-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
>
> Reported-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> Co-developed-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> Tested-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx>"
> Signed-off-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
Please note that this patch fixes just one of the issues with the
driver. There remains other firmware bugs that make AP mode
unreliable. For example:
When a station, e.g. a phone, moves out of range of the AP, and the
station is in power saving mode, packets become stuck in the transmit
queue. With additional debugging added to the driver:
Unable to flush all frames for station xx:xx:xx:ee:11:fe for hlid 3
FW time: 1675524181
Frame 0: expires 1394140264 MAC xx:xx:xx:ee:11:fe FC 17032
Frame 1: expires 1394264633 MAC xx:xx:xx:ee:11:fe FC 17032
These packets get removed by the firmware when the peer is removed.
However, if the broadcast hlid was in power saving at the time, then
it appears the broadcast hlid gets similarly stuck, leading to the
entire network eventually falling over due to the AP effectively
blocking broadcasted ARP requests.
I can find no way around this - and I suspect there is some kind of
refcounting bug in the firmware when told to remove a peer which has
queued packets.
My best workaround for this at the moment is to monitor the state of
the driver via debugfs, and when this problem presents, to take the
AP down and bring it back up, restarting the firmware (but has the
effect of kicking all connected devices off the network.)
Another workaround for is to turn wifi off on the phone before moving
it out of range!
I will attempt to get captures of the network at some point - both
from the packets at the AP network interface, but also the radio
side as well.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!