6.15-rc1/regression/bisected - commit 9dd05df8403b introduced a new warning when I unload mt7921e module

From: Mikhail Gavrilov
Date: Wed Apr 09 2025 - 18:54:34 EST


Hi,
Probably I wouldn't have paid attention to this because in real life I
did not need to unload module mt7921e.
But after commit 9dd05df8403b (thanks to git bisect), I see "warning"
on every system shutdown and reboot.

Of course, an annoying warning could be reproduced with a simple command:
# rmmod mt7921e

Leads to stacktrace:
[ 182.293388] ------------[ cut here ]------------
[ 182.293515] WARNING: CPU: 28 PID: 4057 at net/core/dev.c:7295
__netif_napi_del_locked+0x340/0x420
[ 182.293527] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_queue nfnetlink_queue nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nf_tables ip_set qrtr bnep sunrpc binfmt_misc amd_atl intel_rapl_msr
intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic
mt7921e(-) snd_hda_scodec_component snd_hda_codec_hdmi mt7921_common
btusb edac_mce_amd btrtl mt792x_lib btintel mt76_connac_lib
snd_hda_intel btbcm snd_intel_dspcfg mt76 btmtk snd_intel_sdw_acpi
bluetooth kvm_amd snd_hda_codec vfat mac80211 fat snd_hda_core kvm
snd_hwdep snd_seq snd_seq_device spd5118 libarc4 snd_pcm r8169
wmi_bmof rapl i2c_piix4 snd_timer cfg80211 pcspkr k10temp i2c_smbus
snd rfkill realtek soundcore joydev gpio_amdpt gpio_generic loop
nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp i2c_algo_bit
drm_ttm_helper ttm drm_exec gpu_sched nvme
[ 182.293683] drm_suballoc_helper polyval_clmulni polyval_generic
drm_panel_backlight_quirks ghash_clmulni_intel drm_buddy ucsi_ccg
sha512_ssse3 drm_display_helper nvme_core typec_ucsi sha256_ssse3
sha1_ssse3 typec nvme_keyring sp5100_tco cec nvme_auth video wmi fuse
[ 182.293750] CPU: 28 UID: 0 PID: 4057 Comm: rmmod Tainted: G
W L ------ --- 6.15.0-0.rc1.15.fc43.x86_64+debug #1
PREEMPT(lazy)
[ 182.293758] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[ 182.293762] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.08 09/18/2024
[ 182.293766] RIP: 0010:__netif_napi_del_locked+0x340/0x420
[ 182.293772] Code: 0f 85 f2 00 00 00 48 8b 43 30 be ff ff ff ff 48
8d b8 b8 0d 00 00 e8 bf 73 a0 00 85 c0 0f 85 17 fd ff ff 0f 0b e9 10
fd ff ff <0f> 0b e9 5b fd ff ff 48 c7 c7 f4 ad ab b0 e8 dd 77 f5 fd e9
ea fc
[ 182.293777] RSP: 0018:ffffc90029ccf958 EFLAGS: 00010246
[ 182.293783] RAX: 0000000000000020 RBX: ffff88824f177f88 RCX: 0000000000000001
[ 182.293787] RDX: 1ffff11049e2eff3 RSI: 0000000000000008 RDI: ffff88824f177f98
[ 182.293790] RBP: ffff88824f177f98 R08: ffffffffad1e2bdb R09: ffffed1049e2eff3
[ 182.293794] R10: ffffed1049e2eff4 R11: 000000004bdb515b R12: ffff88824f1766c8
[ 182.293797] R13: ffff88824f173320 R14: ffff88811189c000 R15: dffffc0000000000
[ 182.293800] FS: 00007f1b21ce4740(0000) GS:ffff889028978000(0000)
knlGS:0000000000000000
[ 182.293805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 182.293808] CR2: 00007f94de36efe8 CR3: 000000023c9e6000 CR4: 0000000000f50ef0
[ 182.293812] PKRU: 55555554
[ 182.293816] Call Trace:
[ 182.293819] <TASK>
[ 182.293825] mt76_dma_cleanup+0xbd/0x7c0 [mt76]
[ 182.293845] mt7921_pci_remove+0x180/0x340 [mt7921e]
[ 182.293854] pci_device_remove+0xad/0x210
[ 182.293862] device_release_driver_internal+0x36d/0x520
[ 182.293871] driver_detach+0xc4/0x1a0
[ 182.293878] bus_remove_driver+0x11c/0x2a0
[ 182.293886] pci_unregister_driver+0x2a/0x250
[ 182.293891] ? find_module_all+0xec/0x120
[ 182.293900] __do_sys_delete_module+0x36a/0x580
[ 182.293905] ? __pfx___call_rcu_common.constprop.0+0x10/0x10
[ 182.293912] ? __pfx___do_sys_delete_module+0x10/0x10
[ 182.293922] ? kmem_cache_free+0x3ca/0x570
[ 182.293937] do_syscall_64+0x96/0x1a0
[ 182.293946] ? lockdep_hardirqs_on+0x8c/0x130
[ 182.293951] ? do_syscall_64+0xa3/0x1a0
[ 182.293955] ? do_syscall_64+0xa3/0x1a0
[ 182.293961] ? mark_usage+0x65/0x180
[ 182.293967] ? local_clock+0x15/0x30
[ 182.293971] ? __lock_acquire+0x40f/0x1160
[ 182.293976] ? __lock_release.isra.0+0xb2/0x340
[ 182.293981] ? __pfx___handle_mm_fault+0x10/0x10
[ 182.293991] ? find_held_lock+0x2b/0x80
[ 182.293999] ? __lock_release.isra.0+0x1cb/0x340
[ 182.294009] ? __lock_release.isra.0+0x1cb/0x340
[ 182.294018] ? do_user_addr_fault+0x4b1/0xa60
[ 182.294030] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 182.294034] RIP: 0033:0x7f1b21503efb
[ 182.294049] Code: 73 01 c3 48 8b 0d 05 1f 0f 00 f7 d8 64 89 01 48
83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 1e 0f 00 f7 d8 64 89
01 48
[ 182.294053] RSP: 002b:00007ffff1a01668 EFLAGS: 00000202 ORIG_RAX:
00000000000000b0
[ 182.294059] RAX: ffffffffffffffda RBX: 000056276dc577a0 RCX: 00007f1b21503efb
[ 182.294063] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000056276dc57800
[ 182.294066] RBP: 00007ffff1a01690 R08: 0000000000000000 R09: 0000000000000000
[ 182.294069] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 182.294073] R13: 00007ffff1a01d86 R14: 00007ffff1a018e0 R15: 0000000000000000
[ 182.294085] </TASK>
[ 182.294088] irq event stamp: 55129
[ 182.294091] hardirqs last enabled at (55137): [<ffffffffaa834afe>]
__up_console_sem+0x7e/0x90
[ 182.294097] hardirqs last disabled at (55144): [<ffffffffaa834ae3>]
__up_console_sem+0x63/0x90
[ 182.294102] softirqs last enabled at (52944): [<ffffffffaa613062>]
handle_softirqs+0x592/0x860
[ 182.294108] softirqs last disabled at (52937): [<ffffffffaa613466>]
__irq_exit_rcu+0x126/0x240
[ 182.294113] ---[ end trace 0000000000000000 ]---
[ 209.265604] workqueue: gc_worker [nf_conntrack] hogged CPU for
>10000us 11 times, consider switching to WQ_UNBOUND

commit 9dd05df8403bda5b68178b795c554b3940628bb6
Author: Jakub Kicinski <kuba@xxxxxxxxxx>
Date: Mon Feb 3 13:58:16 2025 -0800

net: warn if NAPI instance wasn't shut down
....
Drivers should always disable a NAPI instance before removing it.
If they don't the instance may be queued for polling.
Since commit 86e25f40aa1e ("net: napi: Add napi_config")
we also remove the NAPI from the busy polling hash table
in napi_disable(), so not disabling would leave a stale
entry there.
....
Use of busy polling is relatively uncommon so bugs may be lurking
in the drivers. Add an explicit warning.
....
Reviewed-by: Joe Damato <jdamato@xxxxxxxxxx>
Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Link: https://patch.msgid.link/20250203215816.1294081-1-kuba@xxxxxxxxxx
Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>

net/core/dev.c | 3 +++
1 file changed, 3 insertions(+)

In the commit annotation I see that it was done on purpose so that
those responsible would fix their module themselves, right?

Who could look into it? Deren?

My machine spec: https://linux-hardware.org/?probe=fa0da30b11
And I attached below my build config and full kernel log.

--
Best Regards,
Mike Gavrilov.

Attachment: .config.zip
Description: Zip archive

Attachment: dmesg.zip
Description: Zip archive

Attachment: bisect-log-warning-at-__netif_napi_del_locked.zip
Description: Zip archive