Re: [PATCH 4/9] dmaengine: dw-edma: HDMA: Add memory barrier before starting the DMA transfer in remote setup

From: Köry Maincent
Date: Tue Sep 12 2023 - 04:55:50 EST


Hello Serge,

I am back with an hardware design answer:
> "Even though the PCIe itself respects the transactions ordering, the
> AXI bus does not have an end-to-end completion acknowledgement (it
> terminates at the PCIe EP boundary with bus), and does not guaranteed
> ordering if accessing different destinations on the Bus. So, an access to LL
> could be declared complete even though the transactions is still being
> pipelined in the AXI Bus. (a dozen or so clocks, I can give an accurate
> number if needed)
>
> The access to DMA registers is done through BAR0 “rolling”
> so the transaction does not actually go out on the AXI bus and
> looped-back to PCIe DMA, rather it stays inside the PCIe EP.
>
> For the above reasons, hypothetically, there’s a chance that even if the DMA
> LL is accessed before the DM DB from PCIe RC side, the DB could be updated
> before the LL in local memory."

On Thu, 22 Jun 2023 19:22:20 +0300
Serge Semin <fancer.lancer@xxxxxxxxx> wrote:

> If we get assured that hardware with such problem exists (if you'll get
> confirmation about the supposition 3. above) then we'll need to
> activate your trick for that hardware only. Adding dummy reads for all
> the remote eDMA setups doesn't look correct since it adds additional
> delay to the execution path and especially seeing nobody has noticed
> and reported such problem so far (for instance Gustavo didn't see the
> problem on his device otherwise he would have fixed it).
>
> So if assumption 3. is correct then I'd suggest the next
> implementation: add a new dw_edma_chip_flags flag defined (a.k.a
> DW_EDMA_SLOW_MEM), have it specified via the dw_edma_chip.flags field
> in the Akida device probe() method and activate your trick only if
> that flag is set.

The flag you suggested is about slow memory write but as said above the issue
comes from the AXI bus and not the memory. I am wondering why you don't see
this issue. If I understand well it should be present on all IP as the DMA
register is internal to the IP and the LL memory is external through AXI bus.
Did you stress your IP? On my side it appears with lots of operation using
several (at least 3) thread through 2 DMA channels.

Köry