Re: [PATCH] net: Handle napi_schedule() calls from non-interrupt

From: MOESSBAUER, Felix
Date: Mon Mar 03 2025 - 04:47:06 EST


On Fri, 2025-02-21 at 23:12 +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 21, 2025 at 12:59:26PM -0500, Joe Damato a écrit :
> > On Fri, Feb 21, 2025 at 06:30:09PM +0100, Frederic Weisbecker
> > wrote:
> > > napi_schedule() is expected to be called either:
> > >
> > > * From an interrupt, where raised softirqs are handled on IRQ
> > > exit
> > >
> > > * From a softirq disabled section, where raised softirqs are
> > > handled on
> > >   the next call to local_bh_enable().
> > >
> > > * From a softirq handler, where raised softirqs are handled on
> > > the next
> > >   round in do_softirq(), or further deferred to a dedicated
> > > kthread.
> > >
> > > Other bare tasks context may end up ignoring the raised NET_RX
> > > vector
> > > until the next random softirq handling opportunity, which may not
> > > happen before a while if the CPU goes idle afterwards with the
> > > tick
> > > stopped.
> > >
> > > Such "misuses" have been detected on several places thanks to
> > > messages
> > > of the kind:
> > >
> > > "NOHZ tick-stop error: local softirq work is pending,
> > > handler #08!!!"
> >
> > Might be helpful to include the stack trace of the offender you did
> > find which led to this change?
>
> There are several of them. Here is one example:
>
> __raise_softirq_irqoff
> __napi_schedule
> rtl8152_runtime_resume.isra.0
> rtl8152_resume
> usb_resume_interface.isra.0
> usb_resume_both
> __rpm_callback
> rpm_callback
> rpm_resume
> __pm_runtime_resume
> usb_autoresume_device
> usb_remote_wakeup
> hub_event
> process_one_work
> worker_thread
> kthread
> ret_from_fork
> ret_from_fork_asm
>
> There is also drivers/net/usb/r8152.c::rtl_work_func_t
>
> And also netdevsim:
> https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@xxxxxxxxxx/
>
> And probably others...

Hi, thanks for bringing this up. This topic is currently also discussed
on the linux-rt-users list. +CC Sebastian.

https://www.spinics.net/lists/linux-rt-users/msg28317.html


>
> I proposed a runtime detection here:
>
>  
> https://lore.kernel.org/lkml/20250212174329.53793-2-frederic@xxxxxxxxxx/

It would be pretty helpful to have a tracepoint there to easily get
callstacks in case the warning happens. Currently we are tracing this
by adding a filter on the printk message.

>
> But I plan to actually introduce a more generic detection in
> __raise_softirq_irqsoff() itself instead.
>  
> > Based on the scope of the problem it might be better to fix the
> > known offenders and add a WARN_ON_ONCE or something instead of the
> > proposed change? Not sure, but having more information might help
> > make that determination.
>
> Well, based on the fix proposal I see here:
> https://lore.kernel.org/netdev/20250219-netdevsim-v3-1-811e2b8abc4c@xxxxxxxxxx/
>
> I think that fixing this on the caller level can be very error prone
> and involve nasty workarounds.
>
> Oh you just made me look at the past:
>
>   019edd01d174 ("ath10k: sdio: Add missing BH locking around
> napi_schdule()")
>   330068589389 ("idpf: disable local BH when scheduling napi for
> marker packets")
>   e3d5d70cb483 ("net: lan78xx: fix "softirq work is pending" error")
>   e55c27ed9ccf ("mt76: mt7615: add missing bh-disable around rx napi
> schedule")
>   c0182aa98570 ("mt76: mt7915: add missing bh-disable around tx napi
> enable/schedule")
>   970be1dff26d ("mt76: disable BH around napi_schedule() calls")
>   019edd01d174 ("ath10k: sdio: Add missing BH locking around
> napi_schdule()")
>   30bfec4fec59 ("can: rx-offload:
> can_rx_offload_threaded_irq_finish(): add new  function to be called
> from threaded interrupt")
>   e63052a5dd3c ("mlx5e: add add missing BH locking around
> napi_schdule()")
>   83a0c6e58901 ("i40e: Invoke softirqs after napi_reschedule")
>   bd4ce941c8d5 ("mlx4: Invoke softirqs after napi_reschedule")
>   8cf699ec849f ("mlx4: do not call napi_schedule() without care")
>   ec13ee80145c ("virtio_net: invoke softirqs after __napi_schedule")
>
> I think this just shows how successful it has been to leave the
> responsibility to the
> caller so far.
>
> And also note that these issues are reported for years sometimes
> firsthand to us
> in the timer subsystem because this is the place where we detect
> entering in idle
> with softirqs pending.
>
> >
> > > Therefore fix this from napi_schedule() itself with waking up
> > > ksoftirqd
> > > when softirqs are raised from task contexts.
> > >
> > > Reported-by: Paul Menzel <pmenzel@xxxxxxxxxxxxx>
> > > Closes: 354a2690-9bbf-4ccb-8769-fa94707a9340@xxxxxxxxxxxxx
> >
> > AFAIU, Closes tags should point to URLs not message IDs.
>
> Good point!
>
> >
> > If this is a fix, the subject line should be:
> >    [PATCH net]
>
> Ok.
>
> >
> > And there should be a Fixes tag referencing the SHA which caused
> > the
> > issue and the patch should CC stable.
>
> At least since bea3348eef27 ("[NET]: Make NAPI polling independent of
> struct
> net_device objects."). It's hard for me to be sure it's not older.

We saw this message at least on the following kernel versions as well:

- v6.1.90-rt (Debian -rt kernel)
- v6.1.120-rt (Debian -rt kernel)
- v6.1.119-rt45 (So yes, this is also affected)
- v6.1.120-rt47

Felix

>
>
> >
> > See:
> >
> > https://www.kernel.org/doc/html/v6.13/process/maintainer-netdev.html#netdev-faq
>
> Thanks.

--
Siemens AG
Linux Expert Center
Friedrich-Ludwig-Bauer-Str. 3
85748 Garching, Germany