Re: [RFC PATCH] net: stmmac: Fix the problem about interrupt storm
From: cathy cai
Date: Tue Apr 02 2024 - 22:01:26 EST
Hi Romain,
On Sun, Mar 31, 2024 at 4:35 PM Romain Gantois
<romain.gantois@xxxxxxxxxxx> wrote:
>
> Hello Cathy,
>
> On Wed, 27 Mar 2024, Cathy Cai wrote:
>
> > Tx queue time out then reset adapter. When reset the adapter, stmmac driver
> > sets the state to STMMAC_DOWN and calls dev_close() function. If an interrupt
> > is triggered at this instant after setting state to STMMAC_DOWN, before the
> > dev_close() call.
> >
> ...
> > - set_bit(STMMAC_DOWN, &priv->state);
> > dev_close(priv->dev);
> > + set_bit(STMMAC_DOWN, &priv->state);
> > dev_open(priv->dev, NULL);
> > clear_bit(STMMAC_DOWN, &priv->state);
> > clear_bit(STMMAC_RESETING, &priv->state);
>
> If this IRQ issue can happen whenever STMMAC_DOWN is set while the net device is
> open, then it could also happen between the dev_open() and
> clear_bit(STMMAC_DOWN) calls right? So you'd have to clear STMMAC_DOWN before
> calling dev_open() but then I don't see the usefulness of setting STMMAC_DOWN
> and clearing it immediately. Maybe closing and opening the net device should be
> enough?
>
Yes. It could also happen between the dev_open() and
clear_bit(STMMAC_DOWN) calls.
Although we did not reproduce this scenario, it should have happened
if we had increased
the number of test samples. In addition, I found that other people had
similar problems before.
The link is:
https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@xxxxxxxxxxxxxxxxxxxx/
>
> Moreover, it seems strange to me that stmmac_interrupt() unconditionnally
> ignores interrupts when the driver is in STMMAC_DOWN state. This seems like
> dangerous behaviour, since it could cause IRQ storm issues whenever something
> in the driver sets this state. I'm not too familiar with the interrupt handling
> in this driver, but maybe stmmac_interrupt() could clear interrupts
> unconditionnally in the STMMAC_DOWN state?
>
Clear interrupts unconditionally in the STMMAC_DOWN state directly
certainly won't cause this problem.
This may be too rough, maybe this design has other considerations.
>
> Best Regards,
>
> --
> Romain Gantois, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
Best Regards,
Cathy