Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll

From: Dragos Tatulea

Date: Mon May 04 2026 - 07:31:14 EST


On Wed, Apr 29, 2026 at 12:43:54PM +0000, Dragos Tatulea wrote:
> On Wed, Apr 29, 2026 at 02:13:43PM +0200, Björn Töpel wrote:
> > Dragos Tatulea <dtatulea@xxxxxxxxxx> writes:
> >
> > > On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
> > >> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
> > >> > + local_irq_save(flags);
> > >> > + hrtimer_start(&napi->timer, ns_to_ktime(timeout),
> > >> > + HRTIMER_MODE_REL_PINNED);
> > >> > clear_bit(NAPI_STATE_SCHED, &napi->state);
> > >> > + local_irq_restore(flags);
> > >>
> > >> I don't think disabling IRQ is necessary?
> > >> Isn't it legal to clear the bit first then schedule the timer?
> > >> The timer does not own the napi instance.
> > >
> > > Isn't the following scenario possible (but extremely unlikely)?
> > >
> > > 1. busy_poll_stop(): napi timer is schedueled.
> > > 2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
> > > 4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
> > > because NAPI_STATE_SCHED is set.
> > > 5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.
> >
> > (Nice work finding the bug! I had to jog my memory to understand the
> > gnarly busy-poll details again!)
> >
> > You're right that you need the save/restore if you do arm timer/clear,
> > but not if you do what Jakub suggests.
> >
> > clear_bit(...);
> > hrtimer_start();
> >
> Ah sorry, I didn't read it right the first time because I was too fixed
> on the order.
>
> > That would also follow the scheme in napi_schedule_done(). (Note that
> > swapping arm/clear does mean that we can get a wasted-timer outcome.)
> >
> If it is wasted is fine. I figured that this could end up with a timer
> scheduled after the napi deletion:
>
> 1. busy_poll_stop(): clears SCHED bit.
> 2. napi_disable() runs past SCHED wait, cancels timer and caller
> deletes napi mem.
> 3. busy_poll_stop(): napi->timer is armed but napi is freed memory
> by now.
>
> What am I missing here?
>
Gentle ping. Once I understand how/why the above is incorrect I can
update the patch and send it.

Thanks,
Dragos