Re: [PATCH] stmmac: Fix incorrect dereference in stmmac_*_interrupt()

From: Serge Semin
Date: Wed Feb 07 2024 - 06:34:54 EST


On Tue, Feb 06, 2024 at 03:07:04PM +0000, Simon Horman wrote:
> On Sat, Feb 03, 2024 at 06:03:21PM +0300, Pavel Sakharov wrote:
> > If 'dev' is NULL, the 'priv' variable has an incorrect address when
> > dereferencing calling netdev_err().
> >
> > Pass 'dev' instead of 'priv->dev" to the function.
> >
> > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> >
> > Signed-off-by: Pavel Sakharov <p.sakharov@xxxxxxxxx>
>
> Thanks Pavel,
>
> I agree with your analysis that this can result in a NULL dereference.
> And that your proposed fix is good: netdev_err() can handle a NULL
> dev argument.
>
> As this seems to be a fix I suggest it should be for net.
> And that it should be based on that tree and designated as such
> in the subject:
>
> Subject: [PATCH net] ...
>
> Also if it is a fix, it should have a fixes tag.
> Perhaps this one:
>
> Fixes: 8532f613bc78 ("net: stmmac: introduce MSI Interrupt routines for mac, safety, RX & TX")
>
>
> I don't think there is a need to respin for the above, though please
> keep this in mind when posting Networking patches in future.
>
>
> Looking at the patch above, and stmmac_main.c, it seems that the following
> functions also suffer from a similar problem:
>
> static irqreturn_t stmmac_msi_intr_tx(int irq, void *data)
> {
> struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)data;
> ...
> dma_conf = container_of(tx_q, struct stmmac_dma_conf, tx_queue[chan]);
> priv = container_of(dma_conf, struct stmmac_priv, dma_conf);
>
> if (unlikely(!data)) {
> netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
> ...
>
> And stmmac_msi_intr_rx(), which follows a similar pattern
> to stmmac_msi_intr_tx().
>
> I also note that in those functions "invalid dev pointer" seems misleading,
> perhaps it ought to be "invalid queue" pointer.
>
> As these problems seem to all have been introduced at the same time,
> perhaps it is appropriate to fix them all in one patch?

IMO we can just drop these and the noted in the patch checks. Before
actually making a decision about that there are three general
questions we'd need to answer to:

1. Do we trust the IRQ-subsystem to supply a correct cookie pointer
specified during the IRQ request procedure?

2. Do we trust the driver code for correctly performing the IRQs
request?

3. If we don't trust to any of that then what is caused if the problem
happens and there is no sanity checks implemented?

Here are my thoughts regarding that:

1. If no dev_id sanity checks implemented in the handlers then having
the IRQ requested with NULL argument passed even though the handlers
imply the netdev pointer will for sure cause troubles right away since
the driver won't work and the system will likely crash. So it will be
spotted during the initial test/debug stage of the respective change.

2. If for some reason the IRQ subsystem supplied a NULL pointer
instead of the cookie pointer, then something is really wrong and the
entire system likely won't work correctly. If this case is possible
to happen then all the kernel IRQ handlers should have been
implemented with such sanity check, which I don't see in practice.

3. If IRQ was caused by the DW *MAC controller, but NULL pointer is
passed and the handler returns IRQ_NONE state, then the actual IRQ
won't be handled AFAICS causing the spurious IRQs detected. Eventually
the IRQ will be effectively disabled. In anyway the driver will stop
working. But even if this happens see points 1. and 2. for the
problem cause implications.

4. The common IRQ handler doesn't have such check in the driver.
Though unlike the rest of the handlers it's assigned with the
IRQF_SHARED flag which requires the cookie pointer passed. Anyway the
sanity check was removed from the common IRQ handler in the commit
f42234ffd531 ("stmmac: fix pointer check after utilization in
stmmac_interrupt") with a justification as being _paranoidal_ and
pointless in the respective context. But in about a year afterwards
the individual IRQ handlers were introduced with the same check but
this time in a bit more reasonable context. Still it doesn't make
the check existence less paranoidal.

5. I took a look at first 20 Ethernet device drivers. None of them has
such checks implemented even though about half of them request IRQs as
non-shared (so the cookie pointer is optional).

6. Finally the checks are implemented in the hard IRQ handlers for
which the less code the better.

To sum all the above up from my point of view the checks are redundant
of course unless we turn on the paranoidal mode and stop trusting the
driver code and the IRQ subsystem.

-Serge(y)

>
> > ---
> > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 4727f7be4f86..5ab5148013cd 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -5848,7 +5848,7 @@ static irqreturn_t stmmac_mac_interrupt(int irq, void *dev_id)
> > struct stmmac_priv *priv = netdev_priv(dev);
> >
> > if (unlikely(!dev)) {
> > - netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
> > + netdev_err(dev, "%s: invalid dev pointer\n", __func__);
> > return IRQ_NONE;
> > }
> >
> > @@ -5868,7 +5868,7 @@ static irqreturn_t stmmac_safety_interrupt(int irq, void *dev_id)
> > struct stmmac_priv *priv = netdev_priv(dev);
> >
> > if (unlikely(!dev)) {
> > - netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
> > + netdev_err(dev, "%s: invalid dev pointer\n", __func__);
> > return IRQ_NONE;
> > }
> >
> >
>