Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

From: Tomasz Figa
Date: Thu Nov 29 2018 - 12:24:37 EST


[CC Marek]

On Thu, Nov 29, 2018 at 9:09 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Thu, Nov 29, 2018 at 5:57 PM Christoph Hellwig <hch@xxxxxx> wrote:
> > On Thu, Nov 29, 2018 at 05:28:07PM +0100, Daniel Vetter wrote:
> > > Just spend a bit of time reading through the implementations already
> > > merged. Is the struct device *dev parameter actually needed anywhere?
> > > dma-api definitely needs it, because we need that to pick the right iommu.
> > > But for cache management from what I've seen the target device doesn't
> > > matter, all the target specific stuff will be handled by the iommu.
> >
> > It looks like only mips every uses the dev argument, and even there
> > the function it passes dev to ignores it. So it could probably be removed.
> >

Whether the cache maintenance operation needs to actually do anything
or not is a function of `dev`. We can have some devices that are
coherent with CPU caches, and some that are not, on the same system.

> > >
> > > Dropping the dev parameter would make this a perfect fit for coherency
> > > management of buffers used by multiple devices. Right now there's all
> > > kinds of nasty tricks for that use cases needed to avoid redundant
> > > flushes.
> >
> > Note that one thing I'd like to avoid is exposing these funtions directly
> > to drivers, as that will get us into all kinds of abuses.
>
> What kind of abuse do you expect? It could very well be that gpu folks
> call that "standard use case" ... At least on x86 with the i915 driver
> we pretty much rely on architectural guarantees for how cache flushes
> work very much. Down to userspace doing the cache flushing for
> mappings the kernel has set up.

i915 is a very specific case of a fully contained,
architecture-specific hardware subsystem, where you can just hardcode
all integration details inside the driver, because nobody else would
care.

In ARM world, you can have the same IP blocks licensed by multiple SoC
vendors with different integration details and that often includes the
option of coherency.

>
> > So I'd much prefer if we could have iommu APIs wrapping these that are
> > specific to actual uses cases that we understand well.
> >
> > As for the buffer sharing: at least for the DMA API side I want to
> > move the current buffer sharing users away from dma_alloc_coherent
> > (and coherent dma_alloc_attrs users) and the remapping done in there
> > required for non-coherent architectures. Instead I'd like to allocate
> > plain old pages, and then just dma map them for each device separately,
> > with DMA_ATTR_SKIP_CPU_SYNC passed for all but the first user to map
> > or last user to unmap. On the iommu side it could probably work
> > similar.
>
> I think this is what's often done. Except then there's also the issue
> of how to get at the cma allocator if your device needs something
> contiguous. There's a lot of that still going on in graphics/media.

I suppose one could just expose CMA with the default pool directly.
It's just an allocator, so not sure why it would need any
device-specific information.

There is also the use case of using CMA with device-specific pools of
memory reusable by the system when not used by the device and those
would have to somehow get the pool to allocate from, but I wonder if
struct device is the right way to pass such information. I'd see the
pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
and perhaps a wrapper cma_alloc_default(size, flags) that is just a
simple macro calling cma_alloc(&cma_pool_default, size, flags).

Best regards,
Tomasz