Re:Re: [PATCH v2] net: stmmac: fix fatal bus error on resume by reinitializing RX buffers

From: Ding Hui

Date: Tue Jun 02 2026 - 05:45:16 EST


At 2026-05-28 20:02:02, "Paolo Abeni" <pabeni@xxxxxxxxxx> wrote:

>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index 3591755ea30b..0dc27d8c66a0 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -1642,6 +1642,79 @@ static void stmmac_clear_descriptors(struct stmmac_priv *priv,
>> stmmac_clear_tx_descriptors(priv, dma_conf, queue);
>> }
>>
>> +/**
>> + * stmmac_reinit_rx_descriptors - re-program RX descriptors from existing
>> + * buffers (allocation-free)
>
>[ ... ]
>
>> +static void stmmac_reinit_rx_descriptors(struct stmmac_priv *priv,
>> + struct stmmac_dma_conf *dma_conf,
>> + u32 queue)
>> +{
>> + struct stmmac_rx_queue *rx_q = &dma_conf->rx_queue[queue];
>> + int i;
>> +
>> + for (i = 0; i < dma_conf->dma_rx_size; i++) {
>> + struct stmmac_rx_buffer *buf = &rx_q->buf_pool[i];
>> + struct dma_desc *p = stmmac_get_rx_desc(priv, rx_q, i);
>> +
>> + if (rx_q->xsk_pool) {
>> + dma_addr_t dma_addr;
>> +
>> + /* The XSK pool may not be fully populated (e.g.
>> + * xdpsock TX-only); skip empty slots.
>> + */
>> + if (!buf->xdp)
>> + continue;
>
>[High]
>Does the continue here regress to the same fatal bus error this patch
>sets out to fix?
>
>When the slot is skipped, its descriptor address fields keep whatever
>HW writeback left behind. Then stmmac_clear_descriptors() runs right
>after stmmac_reinit_rx_descriptors() in stmmac_resume() and unconditionally
>re-arms OWN on every slot. For dwmac4 that is dwmac4_set_rx_owner():
>
> p->des3 |= cpu_to_le32(RDES3_OWN | RDES3_BUFFER1_VALID_ADDR |
> RDES3_INT_ON_COMPLETION_EN);
>
>and for enh_desc / ndesc the equivalent OR of RDES0_OWN / RDES1 fields.
>None of these clear the address fields.
>
>When stmmac_hw_setup() restarts the DMA, the controller sees an owned
>descriptor with VALID_ADDR set and a stale address — the exact failure
>mode the commit message describes. Could the skipped slots be cleared
>(stmmac_clear_desc() or zeroing des0/des1/des2) instead of left intact,
>or arranged so clear_descriptors() does not arm OWN on them?

It's right that the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.

>
>> +
>> + dma_addr = xsk_buff_xdp_get_dma(buf->xdp);
>> + stmmac_set_desc_addr(priv, p, dma_addr);
>> + stmmac_set_desc_sec_addr(priv, p, 0, false);
>> + } else {
>> + /* Theoretically unreachable: napi_disable() in
>> + * stmmac_suspend() ensures all initialized slots
>> + * have a valid page before we get here.
>> + * Defensive check only.
>> + */
>> + if (!buf->page)
>> + continue;
>
>[High]
>Is the comment "Theoretically unreachable: napi_disable() in
>stmmac_suspend() ensures all initialized slots have a valid page"
>accurate?
>
>stmmac_rx() assigns buf->page = NULL at multiple points when handing the
>page on to the stack or to XDP, for example:
>
> buf->page = NULL;
>
>in stmmac_rx() after the page has been consumed.
>
>stmmac_rx_refill() allocates with GFP_ATOMIC | __GFP_NOWARN and breaks
>on allocation failure without advancing dirty_rx, leaving the failing
>slot (and any subsequent slots) with buf->page == NULL:
>
> if (!buf->page) {
> buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> if (!buf->page)
> break;
> ...
> }
>
>stmmac_disable_all_queues() only does synchronize_rcu() and
>napi_disable() — it does not retry refills. Under the memory-pressure
>scenario this patch is explicitly designed to make safe, a prior
>GFP_ATOMIC failure can leave the ring with NULL-page slots at the
>moment suspend happens.
>
>If such a slot is then skipped here, the same OWN-with-stale-address
>condition described above applies on resume. Should the page_pool path
>also clear the descriptor for skipped slots, or allocate a replacement
>buffer rather than leave a stale-address descriptor for
>stmmac_clear_descriptors() to arm?
>
>> +
>> + stmmac_set_desc_addr(priv, p, buf->addr);
>> + stmmac_set_desc_sec_addr(priv, p, buf->sec_addr,
>> + priv->sph_active &&
>> + buf->sec_page);
>> +
>> + if (dma_conf->dma_buf_sz == BUF_SIZE_16KiB)
>> + stmmac_init_desc3(priv, p);
>> + }
>> + }
>
>[ ... ]
>
>> +}
>> +

Same as above, the skipped slots must be cleared instead of left intact.
This change will be reflected in version v3.

Thanks for the review. I will upload the v3 version soon.

Ding Hui