On Wed, Dec 04, 2024 at 04:58:34PM +0100, Thierry Reding wrote:
This doesn't match the location from earlier, but at least there's
something afoot here that needs fixing. I suppose this could simply be
hiding any subsequent errors, so once this is fixed we might see other
similar issues.
Well, having a quick look at this, the first thing which stands out is:
In stmmac_tx_clean(), we have:
if (likely(tx_q->tx_skbuff_dma[entry].buf &&
tx_q->tx_skbuff_dma[entry].buf_type != STMMAC_TXBUF_T
_XDP_TX)) {
if (tx_q->tx_skbuff_dma[entry].map_as_page)
dma_unmap_page(priv->device,
tx_q->tx_skbuff_dma[entry].buf,
tx_q->tx_skbuff_dma[entry].len,
DMA_TO_DEVICE);
else
dma_unmap_single(priv->device,
tx_q->tx_skbuff_dma[entry].buf,
tx_q->tx_skbuff_dma[entry].len,
DMA_TO_DEVICE);
tx_q->tx_skbuff_dma[entry].buf = 0;
tx_q->tx_skbuff_dma[entry].len = 0;
tx_q->tx_skbuff_dma[entry].map_as_page = false;
}
So, tx_skbuff_dma[entry].buf is expected to point appropriately to the
DMA region.
Now if we look at stmmac_tso_xmit():
des = dma_map_single(priv->device, skb->data, skb_headlen(skb),
DMA_TO_DEVICE);
if (dma_mapping_error(priv->device, des))
goto dma_map_err;
if (priv->dma_cap.addr64 <= 32) {
...
} else {
...
des += proto_hdr_len;
...
}
tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_headlen(skb);
tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
This will result in stmmac_tx_clean() calling dma_unmap_single() using
"des" and "skb_headlen(skb)" as the buffer start and length.
One of the requirements of the DMA mapping API is that the DMA handle
returned by the map operation will be passed into the unmap function.
Not something that was offset. The length will also be the same.
We can clearly see above that there is a case where the DMA handle has
been offset by proto_hdr_len, and when this is so, the value that is
passed into the unmap operation no longer matches this requirement.
So, a question to the reporter - what is the value of
priv->dma_cap.addr64 in your failing case? You should see the value
in the "Using %d/%d bits DMA host/device width" kernel message.