Re: [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes

From: Liam Mark
Date: Tue Jan 22 2019 - 17:50:07 EST


On Tue, 22 Jan 2019, Andrew F. Davis wrote:

> On 1/21/19 4:12 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >
> >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> >>> The main use case is for allowing clients to pass in
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>> ION the buffers aren't usually accessed from the CPU so this allows
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>
> >> This can't work. The cpu can still easily speculate into this area.
> >
> > Can you provide more detail on your concern here.
> > The use case I am thinking about here is a cached buffer which is accessed
> > by a non IO-coherent device (quite a common use case for ION).
> >
> > Guessing on your concern:
> > The speculative access can be an issue if you are going to access the
> > buffer from the CPU after the device has written to it, however if you
> > know you aren't going to do any CPU access before the buffer is again
> > returned to the device then I don't think the speculative access is a
> > concern.
> >
> >> Moreover in general these operations should be cheap if the addresses
> >> aren't cached.
> >>
> >
> > I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> >
>
> These buffers are cacheable, not cached, if you haven't written anything
> the data wont actually be in cache.

That's true

> And in the case of speculative cache
> filling the lines are marked clean. In either case the only cost is the
> little 7 instruction loop calling the clean/invalidate instruction (dc
> civac for ARMv8) for the cache-lines. Unless that is the cost you are
> trying to avoid?
>

This is the cost I am trying to avoid and this comes back to our previous
discussion. We have a coherent system cache so if you are doing this for
every cache line on a large buffer it adds up with this work and the going
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even
larger.

I also still think you would want to solve this properly such that
invalidates aren't being done unnecessarily.

> In that case if you are mapping and unmapping so much that the little
> CMO here is hurting performance then I would argue your usage is broken
> and needs to be re-worked a bit.
>

I am not sure I would say it is broken, the large buffers (example 1080P
buffers) are mapped and unmapped on every frame. I don't think there is
any clean way to avoid that in a pipelining framework, you could ask
clients to keep the buffers dma mapped but there isn't necessarily a good
time to tell them to unmap.

It would be unfortunate to not consider this something legitimate for
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.


Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project