Re: [RFC PATCH net-next 1/2] net: napi: Fix interrupts permanently disabled during busy poll
From: Björn Töpel
Date: Wed Apr 29 2026 - 08:14:00 EST
Dragos Tatulea <dtatulea@xxxxxxxxxx> writes:
> On Tue, Apr 28, 2026 at 05:38:44PM -0700, Jakub Kicinski wrote:
>> On Tue, 28 Apr 2026 17:51:30 +0000 Dragos Tatulea wrote:
>> > + local_irq_save(flags);
>> > + hrtimer_start(&napi->timer, ns_to_ktime(timeout),
>> > + HRTIMER_MODE_REL_PINNED);
>> > clear_bit(NAPI_STATE_SCHED, &napi->state);
>> > + local_irq_restore(flags);
>>
>> I don't think disabling IRQ is necessary?
>> Isn't it legal to clear the bit first then schedule the timer?
>> The timer does not own the napi instance.
>
> Isn't the following scenario possible (but extremely unlikely)?
>
> 1. busy_poll_stop(): napi timer is schedueled.
> 2. Hard irq pre-empts busy_poll_stop() and takes an unusually long time.
> 4. napi timer triggers (also hard irq), napi_watchdog() skips schedule
> because NAPI_STATE_SCHED is set.
> 5. busy_poll_stop(): NAPI_STATE_SCHED gets cleared.
(Nice work finding the bug! I had to jog my memory to understand the
gnarly busy-poll details again!)
You're right that you need the save/restore if you do arm timer/clear,
but not if you do what Jakub suggests.
clear_bit(...);
hrtimer_start();
That would also follow the scheme in napi_schedule_done(). (Note that
swapping arm/clear does mean that we can get a wasted-timer outcome.)
Björn