Re: dma_sync_*_for_cpu and direction=TO_DEVICE (was Re: [PATCH 02/20] dma-mapping: provide a generic dma-noncoherent implementation)
From: Alexey Brodkin
Date: Fri May 18 2018 - 15:01:57 EST
Hi Russel,
On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
> It's necessary. Take a moment to think carefully about this:
>
> dma_map_single(, dir)
>
> dma_sync_single_for_cpu(, dir)
>
> dma_sync_single_for_device(, dir)
>
> dma_unmap_single(, dir)
>
> In the case of a DMA-incoherent architecture, the operations done at each
> stage depend on the direction argument:
>
> map for_cpu for_device unmap
> TO_DEV writeback none writeback none
> TO_CPU invalidate invalidate* invalidate invalidate*
> BIDIR writeback invalidate writeback invalidate
>
> * - only necessary if the CPU speculatively prefetches.
I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even
if CPU doesn't preferch - what if we reuse the same buffer for multiple
reads from DMA device?
> The multiple invalidations for the TO_CPU case handles different
> conditions that can result in data corruption, and for some CPUs, all
> four are necessary.
I would agree that map()/unmap() a quite a special cases and so depending
on direction we need to execute in them either for_cpu() or for_device()
call-backs depending on direction.
As for invalidation in case of for_device(TO_CPU) I still don't see
a rationale behind it. Would be interesting to see a real example where
we benefit from this.
> This is what is implemented for 32-bit ARM, depending on the CPU
> capabilities, as we have DMA incoherent devices and we have CPUs that
> speculatively prefetch data, and so may load data into the caches while
> DMA is in operation.
>
>
> Things get more interesting if the implementation behind the DMA API has
> to copy data between the buffer supplied to the mapping and some DMA
> accessible buffer:
>
> map for_cpu for_device unmap
> TO_DEV copy to dma none copy to dma none
> TO_CPU none copy to cpu none copy to cpu
> BIDIR copy to dma copy to cpu copy to dma copy to cpu
>
> So, in both cases, the value of the direction argument defines what you
> need to do in each call.
Interesting enough in your seond table (which describes more complicated
case indeed) you set "none" for for_device(TO_CPU) which looks logical
to me.
So IMHO that's what make sense:
---------------------------->8-----------------------------
map for_cpu for_device unmap
TO_DEV writeback none writeback none
TO_CPU none invalidate none invalidate*
BIDIR writeback invalidate writeback invalidate*
---------------------------->8-----------------------------
* is the case for prefetching CPU.
-AlexeyN§²æ¸yú²X¬¶ÇvØ)Þ{.nÇ·¥{±jZaj)íèjg¬±¨¶Ýj/ê¹ÞàÞ¨è&¢)ß«a¶Úþøéæ+v¨wè¥