Re: [net v3] net: ethernet: mtk_eth_soc: handle dma buffer size soc specific

From: Bc-bocun Chen (陳柏村)
Date: Wed Jun 05 2024 - 22:43:50 EST


On Tue, 2024-06-04 at 15:25 -0700, Jacob Keller wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 6/3/2024 12:25 PM, Frank Wunderlich wrote:
> > @@ -1142,40 +1142,46 @@ static int mtk_init_fq_dma(struct mtk_eth
> *eth)
> > cnt * soc->tx.desc_size,
> > &eth->phy_scratch_ring,
> > GFP_KERNEL);
> > +
> > if (unlikely(!eth->scratch_ring))
> > return -ENOMEM;
> >
> > -eth->scratch_head = kcalloc(cnt, MTK_QDMA_PAGE_SIZE, GFP_KERNEL);
> > -if (unlikely(!eth->scratch_head))
> > -return -ENOMEM;
> > +phy_ring_tail = eth->phy_scratch_ring + soc->tx.desc_size * (cnt -
> 1);
> >
> > -dma_addr = dma_map_single(eth->dma_dev,
> > - eth->scratch_head, cnt * MTK_QDMA_PAGE_SIZE,
> > - DMA_FROM_DEVICE);
> > -if (unlikely(dma_mapping_error(eth->dma_dev, dma_addr)))
> > -return -ENOMEM;
> > +for (j = 0; j < DIV_ROUND_UP(soc->tx.fq_dma_size,
> MTK_FQ_DMA_LENGTH); j++) {
> > +len = min_t(int, cnt - j * MTK_FQ_DMA_LENGTH, MTK_FQ_DMA_LENGTH);
> > +eth->scratch_head[j] = kcalloc(len, MTK_QDMA_PAGE_SIZE,
> GFP_KERNEL);
> >
> > -phy_ring_tail = eth->phy_scratch_ring + soc->tx.desc_size * (cnt -
> 1);
> > +if (unlikely(!eth->scratch_head[j]))
> > +return -ENOMEM;
> >
> > -for (i = 0; i < cnt; i++) {
> > -dma_addr_t addr = dma_addr + i * MTK_QDMA_PAGE_SIZE;
> > -struct mtk_tx_dma_v2 *txd;
> > +dma_addr = dma_map_single(eth->dma_dev,
> > + eth->scratch_head[j], len * MTK_QDMA_PAGE_SIZE,
> > + DMA_FROM_DEVICE);
> >
> > -txd = eth->scratch_ring + i * soc->tx.desc_size;
> > -txd->txd1 = addr;
> > -if (i < cnt - 1)
> > -txd->txd2 = eth->phy_scratch_ring +
> > - (i + 1) * soc->tx.desc_size;
> > +if (unlikely(dma_mapping_error(eth->dma_dev, dma_addr)))
> > +return -ENOMEM;
> >
> > -txd->txd3 = TX_DMA_PLEN0(MTK_QDMA_PAGE_SIZE);
> > -if (MTK_HAS_CAPS(soc->caps, MTK_36BIT_DMA))
> > -txd->txd3 |= TX_DMA_PREP_ADDR64(addr);
> > -txd->txd4 = 0;
> > -if (mtk_is_netsys_v2_or_greater(eth)) {
> > -txd->txd5 = 0;
> > -txd->txd6 = 0;
> > -txd->txd7 = 0;
> > -txd->txd8 = 0;
> > +for (i = 0; i < cnt; i++) {
> > +struct mtk_tx_dma_v2 *txd;
> > +
> > +txd = eth->scratch_ring + (j * MTK_FQ_DMA_LENGTH + i) * soc-
> >tx.desc_size;
> > +txd->txd1 = dma_addr + i * MTK_QDMA_PAGE_SIZE;
> > +if (j * MTK_FQ_DMA_LENGTH + i < cnt)
> > +txd->txd2 = eth->phy_scratch_ring +
> > + (j * MTK_FQ_DMA_LENGTH + i + 1) * soc->tx.desc_size;
> > +
> > +txd->txd3 = TX_DMA_PLEN0(MTK_QDMA_PAGE_SIZE);
> > +if (MTK_HAS_CAPS(soc->caps, MTK_36BIT_DMA))
> > +txd->txd3 |= TX_DMA_PREP_ADDR64(dma_addr + i *
> MTK_QDMA_PAGE_SIZE);
> > +
> > +txd->txd4 = 0;
> > +if (mtk_is_netsys_v2_or_greater(eth)) {
> > +txd->txd5 = 0;
> > +txd->txd6 = 0;
> > +txd->txd7 = 0;
> > +txd->txd8 = 0;
> > +}
>
> This block of change was a bit hard to understand what was going on,
> but
> I think I get the result is that you end up allocating different set
> of
> scratch_head per size vs the original only having one scratch_head
> per
> device?
>
> Perhaps you can explain, but we're now allocating a bunch of
> different
> scratch_head pointers.. However, in the patch, the only places that
> we
> modify scratch_head appear to be the allocation path and the free
> path..
> but I can't seem to understand how that would impact the users of
> scratch head? I guess it changes the dma_addr which then changes the
> txd
> values we program?

In our hardware design, we need to allocate a large number of fq_dma
buffers for buffering in the hardware-accelerated path. Each fq_dma
buffer requires 2048 bytes of memory from the kernel. However, the
driver can only request up to 4 MB of contiguous memory at a time if we
do not want to request a large contiguous memory from the CMA
allocator. Therefore, in the previous driver code, we could only
allocate 2048 fq_dma buffers (2048 * 2048 bytes = 4 MB).

With the MT7988, the Ethernet bandwidth has increased to 2*10 Gbps,
which means we need to allocate more fq_dma buffers (increased to 4096)
to handle the buffering. Consequently, we need to modify the driver
code to allocate multiple contiguous memory and assign them into the
fq_dma ring.

> Ok.
>
> I sort of understand whats going on here, but it was a fair bit to
> fully
> grok this flow.
>
> Overall, I'm no expert on the part or DMA here, but:
>
> Reviewed-by: Jacob Keller <jacob.e.keller@xxxxxxxxx>