[PATCH net v2] net: Handle napi_schedule() calls from non-interrupt

From: Frederic Weisbecker
Date: Sun Feb 23 2025 - 17:17:26 EST


napi_schedule() is expected to be called either:

* From an interrupt, where raised softirqs are handled on IRQ exit

* From a softirq disabled section, where raised softirqs are handled on
the next call to local_bh_enable().

* From a softirq handler, where raised softirqs are handled on the next
round in do_softirq(), or further deferred to a dedicated kthread.

Other bare tasks context may end up ignoring the raised NET_RX vector
until the next random softirq handling opportunity, which may not
happen before a while if the CPU goes idle afterwards with the tick
stopped.

Such "misuses" have been detected on several places thanks to messages
of the kind:

"NOHZ tick-stop error: local softirq work is pending, handler #08!!!"

For example:

__raise_softirq_irqoff
__napi_schedule
rtl8152_runtime_resume.isra.0
rtl8152_resume
usb_resume_interface.isra.0
usb_resume_both
__rpm_callback
rpm_callback
rpm_resume
__pm_runtime_resume
usb_autoresume_device
usb_remote_wakeup
hub_event
process_one_work
worker_thread
kthread
ret_from_fork
ret_from_fork_asm

And also:

* drivers/net/usb/r8152.c::rtl_work_func_t
* drivers/net/netdevsim/netdev.c::nsim_start_xmit

There is a long history of issues of this kind:

019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
330068589389 ("idpf: disable local BH when scheduling napi for marker packets")
e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi schedule")
c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi enable/schedule")
970be1dff26d ("mt76: disable BH around napi_schedule() calls")
019edd01d174 ("ath10k: sdio: Add missing BH locking around napi_schdule()")
30bfec4fec59 ("can: rx-offload: can_rx_offload_threaded_irq_finish(): add new function to be called from threaded interrupt")
e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()")
83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
8cf699ec849f ("mlx4: do not call napi_schedule() without care")
ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")

This shows that relying on the caller to arrange a proper context for
the softirqs to be handled while calling napi_schedule() is very fragile
and error prone. Also fixing them can also prove challenging if the
caller may be called from different kinds of contexts.

Therefore fix this from napi_schedule() itself with waking up ksoftirqd
when softirqs are raised from task contexts.

Reported-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
Reported-by: Jakub Kicinski <kuba@xxxxxxxxxx>
Reported-by: Francois Romieu <romieu@xxxxxxxxxxxxx>
Closes: https://lore.kernel.org/lkml/354a2690-9bbf-4ccb-8769-fa94707a9340@xxxxxxxxxxxxx/
Cc: Breno Leitao <leitao@xxxxxxxxxx>
Cc: Jakub Kicinski <kuba@xxxxxxxxxx>
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
Cc: Paolo Abeni <pabeni@xxxxxxxxxx>
Cc: Francois Romieu <romieu@xxxxxxxxxxxxx>
Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
---
net/core/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 80e415ccf2c8..5c1b93a3f50a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4693,7 +4693,7 @@ static inline void ____napi_schedule(struct softnet_data *sd,
* we have to raise NET_RX_SOFTIRQ.
*/
if (!sd->in_net_rx_action)
- __raise_softirq_irqoff(NET_RX_SOFTIRQ);
+ raise_softirq_irqoff(NET_RX_SOFTIRQ);
}

#ifdef CONFIG_RPS
--
2.48.1