Re: [PATCH] wifi: cfg80211: Lock wiphy in cfg80211_get_station

From: Antonio Quartulli
Date: Tue May 21 2024 - 10:50:44 EST


Hi,

On 21/05/2024 14:15, Remi Pommarel wrote:
On Tue, May 21, 2024 at 09:43:56AM +0200, Antonio Quartulli wrote:
Hi,

On 18/05/2024 17:50, Remi Pommarel wrote:
Wiphy should be locked before calling rdev_get_station() (see lockdep
assert in ieee80211_get_station()).

Adding the lock is fine as nowadays it is taken in pre_doit and released in
post_doit (with some exceptions). Therefore when invoking .get_station from
a side path the lock should be taken too.

It was actually a05829a7222e9d10c416dd2dbbf3929fe6646b89 that introduced
this requirement AFAICS.

IIUC lock requirement was already there before this commit, only it was on
rtnl lock to be taken instead of wiphy one.

You're right.




This fixes the following kernel NULL dereference:

As already said by Johannes, I am not sure it truly fixes this NULL
dereference though.

Have you checked where in ath10k_sta_statistics this is exactly happening?
Do you think some sta was partly released and thus fields were NULLified?

ath10k_sta_statistics+0x10 is associated with the arsta->arvif->ar
statement. It crashes because arsta->arvif is NULL.

Here is a scenario that explains the crash where the same sta
disconnects then reconnects quickly (e.g. OPEN network):


CPU0: CPU1:

batadv_v_elp_periodic_work()
queue_work(batadv_v_elp_get_throughput)

ieee80211_del_station()
ieee80211_add_station()
sta_info_insert()
list_add_tail_rcu()
ath10k_sta_state()
memset(arsta, 0, sizeof(arsta))
batadv_v_elp_get_throughput()
cfg80211_get_station()
ieee80211_get_station() <-- Find sta with its MAC in list
ath10k_sta_statistics()
arsta->arvif->ar <-- arsta is still zeroed


In other words if the same sta has enough time to disconnect then
reconnect before batadv_v_elp_get_throughput get scheduled, sta could be
partially initialized when ath10k_sta_statistics() is called. Locking
the wiphy ensure sta is fully initialized if found.


We were just wondering how you could get the arvif being NULL and your explanation makes sense.

This said, holding the lock is required when invoking get_station via netlink, therefore it's meaningful to hold it even when side invoking it from another module.

Acked-by: Antonio Quartulli <a@xxxxxxxxxxx>


--
Antonio Quartulli