Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
From: Maciej W. Rozycki
Date: Wed May 13 2026 - 11:35:47 EST
On Tue, 12 May 2026, Jakub Kicinski wrote:
> > case FZA_STATE_UNINITIALIZED:
> > netif_carrier_off(dev);
> > - timer_delete_sync(&fp->reset_timer);
> > + timer_delete_sync_try(&fp->reset_timer);
> > fp->ring_cmd_index = 0;
> > fp->ring_uns_index = 0;
> > fp->ring_rmc_tx_index = 0;
> > @@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
> > fp->queue_active = 0;
> > netif_stop_queue(dev);
> > pr_debug("%s: queue stopped\n", fp->name);
> > - timer_delete_sync(&fp->reset_timer);
> > +
> > + spin_lock(&fp->lock);
> > + timer_delete(&fp->reset_timer);
>
> Noob q, why use timer_delete_sync_try() above ?
I guess it got lost in the lengthy commit description:
> [...] For the transition
> to the uninitialised state switch to timer_delete_sync_try() instead, so
> that a timer isn't deleted that's just been rearmed by the timer handler
> and needs to watch for the device to come out of reset again (again, an
> SMP scenario only).
Does it answer your question, or shall I rephrase it?
Actually I gave this piece a bit of extra thought before sending this
final version. I considered having extra state added and protected with
the data access spinlock and then using that state to refrain from
reissuing the reset if the change to the uninitialised state happened
literally as the reset timer handler has been entered.
But I concluded it was not worth it given that it's such an unlikely
state to arrive at. We're talking about recovery from failed recovery
here. So I chose to keep it simple: if the timer has already fired, then
instead just let it re-reset the device and rearm itself, and abandon the
timer deletion here, exactly what timer_delete_sync_try() does. After all
if the device state change happened a moment later, that's what would
happen anyway with either approach and we gave the device enough time
margin to come out of reset, so there's no need to be gentle.
FWIW I want to port this code sometime to the defxx driver, which handles
a newer generation of devices that are an evolution of and quite similar
in many respects to ones handled by this defza driver and which in
particular implement an analogous state machine handled by an RTOS running
on m68k hardware. While devices handled by the defxx driver are much
faster and their reset takes ~1s rather than ~10s, it's still quite a lot
and the original DEC code which just busy-waits in the hardirq context
results in a horrible OS latency spike one user has actually complained
about, as apparently it was quite a frequent event in their setup.
As the defxx driver can actually run on SMP hardware (and I have MIPS,
POWER9, RISC-V, and x86 SMP test systems for the defxx driver in addition
to a bunch of UP machines of various CPU architectures) at that point I
may revisit the choice made here and make an incremental change so that
both drivers use the same approach, but I don't think there's a need right
now. Then again, I might not, as it might be good enough a path of least
resistance approach for both drivers after all.
Maciej