Re: [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers

From: Paolo Abeni

Date: Thu May 28 2026 - 08:07:06 EST


From: AI Reviewer <ai@xxxxxxxxxxx>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: stmmac: fix fatal bus error on resume by reinitializing RX buffers

This patch adds stmmac_reinit_rx_descriptors() and calls it from
stmmac_resume() before stmmac_clear_descriptors() to re-program the RX
descriptor buffer-address fields from buffers that survived across the
PM transition, avoiding any allocation in the resume path.

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 3591755ea30b..0dc27d8c66a0 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1642,6 +1642,79 @@ static void stmmac_clear_descriptors(struct stmmac_priv *priv,
> stmmac_clear_tx_descriptors(priv, dma_conf, queue);
> }
>
> +/**
> + * stmmac_reinit_rx_descriptors - re-program RX descriptors from existing
> + * buffers (allocation-free)

[ ... ]

> +static void stmmac_reinit_rx_descriptors(struct stmmac_priv *priv,
> + struct stmmac_dma_conf *dma_conf,
> + u32 queue)
> +{
> + struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
> + int i;
> +
> + for (i = 0; i < dma_conf->dma_rx_size; i++) {
> + struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
> + struct dma_desc *p = stmmac_get_rx_desc(priv, rx_q, i);
> +
> + if (rx_q->xsk_pool) {
> + dma_addr_t dma_addr;
> +
> + /* The XSK pool may not be fully populated (e.g.
> + * xdpsock TX-only); skip empty slots.
> + */
> + if (!buf->xdp)
> + continue;

[High]
Does the continue here regress to the same fatal bus error this patch
sets out to fix?

When the slot is skipped, its descriptor address fields keep whatever
HW writeback left behind. Then stmmac_clear_descriptors() runs right
after stmmac_reinit_rx_descriptors() in stmmac_resume() and unconditionally
re-arms OWN on every slot. For dwmac4 that is dwmac4_set_rx_owner():

p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR |
RDES3_INT_ON_COMPLETION_EN);

and for enh_desc / ndesc the equivalent OR of RDES0_OWN / RDES1 fields.
None of these clear the address fields.

When stmmac_hw_setup() restarts the DMA, the controller sees an owned
descriptor with VALID_ADDR set and a stale address — the exact failure
mode the commit message describes. Could the skipped slots be cleared
(stmmac_clear_desc() or zeroing des0/des1/des2) instead of left intact,
or arranged so clear_descriptors() does not arm OWN on them?

> +
> + dma_addr = xsk_buff_xdp_get_dma(buf->xdp);
> + stmmac_set_desc_addr(priv, p, dma_addr);
> + stmmac_set_desc_sec_addr(priv, p, 0, false);
> + } else {
> + /* Theoretically unreachable: napi_disable() in
> + * stmmac_suspend() ensures all initialized slots
> + * have a valid page before we get here.
> + * Defensive check only.
> + */
> + if (!buf->page)
> + continue;

[High]
Is the comment "Theoretically unreachable: napi_disable() in
stmmac_suspend() ensures all initialized slots have a valid page"
accurate?

stmmac_rx() assigns buf->page = NULL at multiple points when handing the
page on to the stack or to XDP, for example:

buf->page = NULL;

in stmmac_rx() after the page has been consumed.

stmmac_rx_refill() allocates with GFP_ATOMIC | __GFP_NOWARN and breaks
on allocation failure without advancing dirty_rx, leaving the failing
slot (and any subsequent slots) with buf->page == NULL:

if (!buf->page) {
buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
if (!buf->page)
break;
...
}

stmmac_disable_all_queues() only does synchronize_rcu() and
napi_disable() — it does not retry refills. Under the memory-pressure
scenario this patch is explicitly designed to make safe, a prior
GFP_ATOMIC failure can leave the ring with NULL-page slots at the
moment suspend happens.

If such a slot is then skipped here, the same OWN-with-stale-address
condition described above applies on resume. Should the page_pool path
also clear the descriptor for skipped slots, or allocate a replacement
buffer rather than leave a stale-address descriptor for
stmmac_clear_descriptors() to arm?

> +
> + stmmac_set_desc_addr(priv, p, buf->addr);
> + stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
> + priv->sph_active &&
> + buf->sec_page);
> +
> + if (dma_conf->dma_buf_sz == BUF_SIZE_16KiB)
> + stmmac_init_desc3(priv, p);
> + }
> + }

[ ... ]

> +}
> +
--
This is an AI-generated review.