Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
From: Jakub Kicinski
Date: Tue Apr 28 2026 - 20:32:14 EST
On Tue, 28 Apr 2026 20:04:13 -0400 Martin Karsten wrote:
> On 2026-04-28 19:40, Jakub Kicinski wrote:
> > On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> >> Under certain conditions a queue can be left out with interrupts
> >> disabled and with the napi re-scheduling timer permanently stopped.
> >> This behaviour is triggered by the napi busy poll path when
> >> gro-flush-timeout and defer-hard-irq are set. Here's a sequence of
> >> operations:
> >>
> >> 1. Busy poll starts, NAPI_STATE_SCHED is set to avoid rescheduling napi
> >> from the timer.
> >>
> >> 2. During napi poll, driver disables interrupts due to being in poll
> >> mode (napi_complete_done() returns false because napi->state has
> >> NAPIF_STATE_IN_BUSY_POLL set).
> >
> > Why does the driver have IRQs disabled in busy poll?
>
> The problems occurs in irq deferral mode when both gro-flush-timeout and
> defer-hard-irqs are nonzero and NIC interrupts are disabled.
Okay.
> >> 3. At the end of the busy poll (busy_poll_stop()):
> >> 3.1 napi timer is scheduled and skip_schedule is set (due to config)
> >> 3.2 napi->poll() is called:
> >> - driver poll() processes exactly budget packets
> >> and exits early => napi not scheduled.
> >> (interrupts are still disabled at this point)
> >> 3.3 Since napi poll processed budget packets, __busy_poll_stop()
> >> is called with skip_schedule set => napi is not scheduled here
> >> either.
> >
> > with skip_schedule it calls:
> >
> > clear_bit(NAPI_STATE_SCHED, &napi->state);
> >
> >> 4. If the napi timer from 3.1 gets to be triggered due to slow napi poll
> >> or some other reason, the timer will run with no effect (due to
> >> NAPI_STATE_SCHED being set).
> >
> > And here you claim STATE_SCHED is still set?
>
> Labelling this with number 4. might be misleading, sorry! The concern is
> that a short enough timer (compared to the duration of the driver poll)
> can be triggered before the NAPI_STATE_SCHED bit is cleared at the end
> of Step 3.3.
Ah. Just say that :D Two pages of buggy text, y'all would have been
better off using this one paragraph as the commit message.
Please don't use AI for generating commit messages if that's the cause.
It really is spectacularly shit at it.