Re: [PATCH] net: stmmac: close reset IRQ window and avoid double free
From: Russell King (Oracle)
Date: Fri Mar 20 2026 - 11:42:04 EST
On Fri, Mar 20, 2026 at 02:19:55PM +0800, yangg9 wrote:
> From: yangg9 <yangg9@xxxxxxxxxxxx>
>
> During reset, stmmac_reset_subtask() used to set STMMAC_DOWN before IRQs
> were freed in __stmmac_release(). That leaves a window where interrupts can
> still fire after the device is marked down, which may lead to interrupt
> storms while the interface is transitioning.
>
> Move stmmac_free_irq() earlier in the reset flow, before setting
> STMMAC_DOWN, so the reset path no longer has that interrupt window.
>
> Since IRQs are now released in stmmac_reset_subtask(), guard IRQ release in
> __stmmac_release() with STMMAC_DOWN to avoid a second free_irq() during the
> same reset sequence.
>
> This removes the interrupt-storm window in reset and prevents double IRQ
> release.
So, some points that need to be raised:
- What is the point of STMMAC_DOWN?
STMMAC_DOWN isn't set when the interface is administratively brought
down, the only place where this flag is set is in
stmmac_reset_subtask() and later cleared.
The flag appears to prevent stmmac_service_event_schedule() queueing
the service task while it's still operating, but STMMAC_SERVICE_SCHED
already does that.
It also prevents interrupts being serviced, which causes your
interrupt storm. However, does this matter? Surely stmmac_release()
can already cope with the interrupt handlers being active, since
taking an interface administratively down involves interacting with
it in an active state - when a packet may be received.
It's also used in stmmac_xdp_xmit() and stmmac_xsk_wakeup() to block
further processing in those paths. However, for stmmac_xsk_wakeup()
the only path which calls stmmac_service_event_schedule() is
stmmac_global_err() which nautily calls netif_carrier_off() behind
phylink's back, which will corrupt phylink's state and lead to
phylink API calls being made in weird orders to the driver (this
needs to die.) However, stmmac_xsk_wakeup() checks whether the
carrier is on as well, which is a duplicate check.
So, here's the question: do we need to test STMMAC_DOWN in the
interrupt handlers at all? Can we delete those tests? As you seem
to have a way of triggering the reset subtask, please try removing
those tests from the interrupt handlers, thus simplifying the code
rather than trying a more complex solution.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!