Re: [PATCH] swiotlb: sync buffer when mapping FROM_DEVICE

From: Horia Geanta
Date: Thu May 23 2019 - 12:28:51 EST


On 5/23/2019 8:35 AM, Marek Szyprowski wrote:
> Hi Robin,
>
> On 2019-05-22 15:55, Robin Murphy wrote:
>> On 22/05/2019 14:34, Christoph Hellwig wrote:
>>> On Wed, May 22, 2019 at 02:25:38PM +0100, Robin Murphy wrote:
>>>> Sure, but that should be irrelevant since the effective problem here
>>>> is in
>>>> the sync_*_for_cpu direction, and it's the unmap which nobbles the
>>>> buffer.
>>>> If the driver does this:
>>>>
>>>>     dma_map_single(whole buffer);
>>>>     <device writes to part of buffer>
>>>>     dma_unmap_single(whole buffer);
>>>>     <contents of rest of buffer now undefined>
>>>>
>>>> then it could instead do this and be happy:
>>>>
>>>>     dma_map_single(whole buffer, SKIP_CPU_SYNC);
>>>>     <device writes to part of buffer>
>>>>     dma_sync_single_for_cpu(updated part of buffer);
>>>>     dma_unmap_single(whole buffer, SKIP_CPU_SYNC);
>>>>     <contents of rest of buffer still valid>
>>>
>>> Assuming the driver knows how much was actually DMAed this would
>>> solve the issue.  Horia, does this work for you?
In my particular case, input is provided as a scatterlist, out of which first N
bytes are problematic (not written to by device and corrupted when swiotlb
bouncing is needed), while remaining bytes (Total - N) are updated by the device.

>>
>> Ohhh, and now I've just twigged what you were suggesting - your
>> DMA_ATTR_PARTIAL flag would mean "treat this as a read-modify-write of
>> the buffer because we *don't* know exactly which parts the device may
>> write to". So indeed if we did go down that route we wouldn't need any
>> of the sync stuff I was worrying about (but I might suggest naming it
>> DMA_ATTR_UPDATE instead). Apologies for being slow :)
>
> Don't we have DMA_BIDIRECTIONAL for such case? Maybe we should update
> documentation a bit to point that DMA_FROM_DEVICE expects the whole
> buffer to be filled by the device?
>
Or, put more bluntly, driver must not rely on previous data in the area mapped
DMA_FROM_DEVICE. This limitation stems from the buffer bouncing mechanism of the
swiotlb DMA API backend, which other backends might not suffer from (e.g. IOMMU).

Btw, the device I am working on (caam crypto engine) is deployed in several SoCs
configured differently - with or without an IOMMU (and coherent or non-coherent
etc.). IOW it's a "power user" of the DMA API and I appreciate all the help in
solving / clarifying this kind of implicit assumptions.

Thanks,
Horia