[mt76] BUG: WARN_ONCE using mt7615/mt7622 in Mesh-mode

From: Frank Wunderlich
Date: Sat Oct 30 2021 - 07:01:01 EST


Hi,

i got a Kernel-Warning when using mt7615/mt7622 (mt7915 also reported, but not tested by me yet) in Mesh-Mode:

[ 1009.473796] ------------[ cut here ]------------
[ 1009.478485] WARNING: CPU: 1 PID: 288 at net/core/flow_dissector.c:984 __skb_4
[ 1009.487735] Modules linked in: xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nas
[ 1009.517477] CPU: 1 PID: 288 Comm: napi/phy0-19 Not tainted 5.15.0-rc4-bpi-r21
[ 1009.524803] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 1009.530384] Backtrace:
[ 1009.532838] [<c0cb3878>] (dump_backtrace) from [<c0cb3ac0>] (show_stack+0x20)
[ 1009.540431] r7:000003d8 r6:c0a22cf8 r5:60010013 r4:c1012ecc
[ 1009.546090] [<c0cb3aa0>] (show_stack) from [<c0cb7870>] (dump_stack_lvl+0x48)
[ 1009.553673] [<c0cb7828>] (dump_stack_lvl) from [<c0cb7894>] (dump_stack+0x18)
[ 1009.561258] r5:00000009 r4:c10db094
[ 1009.564839] [<c0cb787c>] (dump_stack) from [<c0127350>] (__warn+0xfc/0x114)
[ 1009.571822] [<c0127254>] (__warn) from [<c0cb4118>] (warn_slowpath_fmt+0x74/)
[ 1009.579328] r7:c0a22cf8 r6:000003d8 r5:c10db094 r4:00000000
[ 1009.584988] [<c0cb40a8>] (warn_slowpath_fmt) from [<c0a22cf8>] (__skb_flow_d)
[ 1009.593799] r8:00000000 r7:c5c40418 r6:c5d8dd00 r5:c1306360 r4:c5adcc00
[ 1009.600502] [<c0a22afc>] (__skb_flow_dissect) from [<c0a24258>] (__skb_get_h)
[ 1009.608878] r10:c5a6c520 r9:c7329264 r8:00000074 r7:00000001 r6:c7328600 r50
[ 1009.616713] r4:c1439b78
[ 1009.619246] [<c0a241d4>] (__skb_get_hash) from [<bf1adf24>] (ieee80211_queue)
[ 1009.629519] r6:c7328600 r5:c5adcc00 r4:c5a6c520
[ 1009.634140] [<bf1ada20>] (ieee80211_queue_skb [mac80211]) from [<bf1ae18c>] )
[ 1009.645154] r10:c5a6c520 r9:c5a6c520 r8:00000074 r7:00000001 r6:c7328600 r50
[ 1009.652995] r4:c5adcc00
[ 1009.655530] [<bf1ae094>] (ieee80211_tx [mac80211]) from [<bf1af9e8>] (ieee80)
[ 1009.666705] r9:00000000 r8:00000000 r7:c5a6ca3c r6:c7328600 r5:c5a6cb84 r4:0
[ 1009.674454] [<bf1af910>] (ieee80211_tx_pending [mac80211]) from [<c012f384>])
[ 1009.685943] r10:c5d8c000 r9:00000040 r8:00000006 r7:00000000 r6:dedaa2ec r54
[ 1009.693780] r4:c5a6cc84
[ 1009.696313] [<c012f2c4>] (tasklet_action_common.constprop.0) from [<c012f3c0)
[ 1009.705908] r9:00000040 r8:00000101 r7:c14583e0 r6:00000006 r5:00000007 r4:8
[ 1009.713655] [<c012f398>] (tasklet_action) from [<c0101460>] (__do_softirq+0x)
[ 1009.721591] [<c0101318>] (__do_softirq) from [<c012ee00>] (do_softirq+0x7c/0)
[ 1009.729014] r10:c5f95820 r9:c5d8c000 r8:c10db338 r7:00000001 r6:c0a35f60 r50
[ 1009.736850] r4:60010013
[ 1009.739383] [<c012ed84>] (do_softirq) from [<c012eedc>] (__local_bh_enable_i)
[ 1009.747495] r5:ffffe000 r4:00000001
[ 1009.751070] [<c012ee04>] (__local_bh_enable_ip) from [<c0a35f8c>] (napi_thre)
[ 1009.759966] r5:c5a6e768 r4:c5d8c000
[ 1009.763541] [<c0a35ed0>] (napi_threaded_poll) from [<c014f010>] (kthread+0x1)
[ 1009.771394] r8:c5a6e768 r7:c0a35ed0 r6:c253ba6c r5:c5f95800 r4:c5fd6d40
[ 1009.778096] [<c014eeb8>] (kthread) from [<c0100130>] (ret_from_fork+0x14/0x2)
[ 1009.785333] Exception stack(0xc5d8dfb0 to 0xc5d8dff8)
[ 1009.790390] dfa0: ???????? ???????? ?????
[ 1009.798575] dfc0: ???????? ???????? ???????? ???????? ???????? ???????? ?????
[ 1009.806760] dfe0: ???????? ???????? ???????? ???????? ???????? ????????
[ 1009.813383] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r58
[ 1009.821220] r4:c5fd6d40 r3:00000017
[ 1009.824871] ---[ end trace 86a4ea831c8189bf ]---


i have applied this Patch to get mesh basicly working (ping works - only this warning shows up):

https://patchwork.kernel.org/project/linux-wireless/patch/20211007225725.2615-1-vincent@xxxxxxxxxxxx/#24506297

configuration is basicly this:

iw phy phy0 interface add mesh0 type mp
ip addr add 192.168.80.1/24 dev mesh0
./wpa_supplicant -i mesh0 -c /etc/wpa_supplicant/meshpoint.conf &

where config is this:

user_mpm=1
network={
ssid="bpi-mesh"
mode=5
frequency=2412
key_mgmt=NONE
}

as far as i have debugged it happens here:

https://github.com/frank-w/BPI-R2-4.14/blob/b61ad126d9da67a52fa395215dc3a4219ff58121/net/core/flow_dissector.c#L984

all 3 possible sources of net are NULL when this happens

[ 104.656046] DEBUG: Passed __skb_flow_dissect 975 skb:0xc5ad9540 net:0x0
[ 104.662738] DEBUG: Passed __skb_flow_dissect 977 skb-dev:0x0,skb-sk:0x0

possible flow as far as i've debugged (most in flow_dissector.c):

__skb_get_hash() => ___skb_get_hash() => skb_flow_dissect_flow_keys() (include/linux/skbuff.h) => __skb_flow_dissect(NULL, skb,...)

so net have to be always taken from the skb, either from skb->dev or skb->sk, both are NULL, so i need to know where the problematic skb is created

from trace i do not see where net/skb-dev is set :(

based on some similar bugs i've found i tried to change this in drivers/net/wireless/mediatek/mt76/mcu.c mt76_mcu_msg_alloc:

- skb = alloc_skb(length, GFP_KERNEL);
+ skb = netdev_alloc_skb(&dev->napi_dev,length);

without success.

crash happens on first device when connection is established, on second when starting ping (which works)

i got Information that this trace is not there in openwrt master, but have not yet found out why.

more infos here:
https://forum.banana-pi.org/t/solved-bpi-r64-mesh-802-11s-on-internal-wifi-card-mt7622av/12610/51?u=frank-w

have anyone an idea why this happens and how to fix it?

regards Frank