Re: [PATCH net-next v2 2/2] net/stmmac: Fix free-after-use panic when interface goes down
From: Jakub Raczynski
Date: Wed Jun 03 2026 - 07:45:40 EST
Gonna respond to Sashiko comments.
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 3a66f28425275..2c6abd5efd55a 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -292,16 +292,18 @@ static void __stmmac_disable_all_queues(struct stmmac_priv *priv)
> > static void stmmac_disable_all_queues(struct stmmac_priv *priv)
> > {
> > u8 rx_queues_cnt = priv->plat->rx_queues_to_use;
> > - struct stmmac_rx_queue *rx_q;
> > u8 queue;
> >
> > - /* synchronize_rcu() needed for pending XDP buffers to drain */
> > for (queue = 0; queue < rx_queues_cnt; queue++) {
> Is this napi_synchronize() loop racy and redundant here?
> __stmmac_disable_all_queues() immediately calls napi_disable(), which
> internally sets NAPI_STATE_DISABLE and spins until NAPI safely exits. Since
> NAPI is not yet disabled during this loop, could a hardware interrupt fire
> and reschedule the NAPI immediately after napi_synchronize() returns?
> Additionally, does this loop fail to cover pure TX queues by only iterating
> up to rx_queues_cnt?
We wouldn't have this commit if that was a case, would we?
Problem is, after this function there is still possible traffic that
will panic the system.
synchronize_rcu() should be good enough for XDP but is not good enough
altogether. napi_synchronize() is proper tool here, as NAPI might be running
on multiple queues.
But are still IRQ's after this possible? As code is written currently,
there is following order:
- disable PHY
- disable NAPI (this), earlier synchronize_rcu()
- disable TX
- disable IRQ's
In original patch, we added synchronize_rcu() after freeing IRQ's as these
were possible, meaning after disabling NAPI there was still IRQ.
In this version we trigger NAPI synchronize which should consume all data,
but IRQ after this might still be possible? It would be very VERY unlucky.
I think napi_synchronize() will consume so much time here that all data
should be processed after disabling PHY.
> > - rx_q = &priv->dma_conf.rx_queue[queue];
> > - if (rx_q->xsk_pool) {
> > - synchronize_rcu();
> > - break;
> > + struct stmmac_channel *ch = &priv->channel[queue];
> > +
> > + if (stmmac_xdp_is_enabled(priv) &&
> > + test_bit(queue, priv->af_xdp_zc_qps)) {
> > + napi_synchronize(&ch->rxtx_napi);
> Does removing synchronize_rcu() introduce a regression in XDP redirect
> synchronization?
> In network drivers, stmmac_xdp_xmit() is called by other CPUs in their own
> NAPI contexts when processing XDP_REDIRECT packets, which is protected by
> rcu_read_lock().
> By using a local napi_synchronize(), the driver only waits for its own NAPI
> thread to stop running. Does this leave zero synchronization against other
> CPUs concurrently executing stmmac_xdp_xmit(), causing the exact
> Use-After-Free on TX rings this patch aims to fix when
> free_dma_desc_resources() zeroes out the tx_skbuff arrays?
Not sure I agree but I would gladly be enlightened here.
> > + } else {
> > + napi_synchronize(&ch->rx_napi);
> > }
> > +
> > }
> >
> > __stmmac_disable_all_queues(priv);
> This is a pre-existing issue, but does stmmac_release() fail to cancel
> priv->service_task, allowing the interface to unexpectedly come back up?
> When the interface is brought down normally, dev_close() invokes
> stmmac_release(), but cancel_work_sync(&priv->service_task) is not called.
> If a hardware error schedules service_task just before dev_close() takes
> the rtnl_lock, it will block on the lock and run immediately after the
> interface is down. Because STMMAC_DOWN is not set, it will bypass the
> test_bit(STMMAC_DOWN, &priv->state) check, execute dev_close(), and then
> incorrectly call dev_open() via stmmac_reset_subtask().
Scenario where 'ifconfig down' + queue error/timeout happen at the same time?
Sure, might fix this if requested, but this is the risk I may live with,
it is impossible IMO.
BR
Jakub Raczynski