Re: noveau vs arm dma ops

From: Daniel Vetter
Date: Wed Apr 25 2018 - 06:04:42 EST

On Wed, Apr 25, 2018 at 01:54:39AM -0700, Christoph Hellwig wrote:
> [discussion about this patch, which should have been cced to the iommu
> and linux-arm-kernel lists, but wasn't:
> On Wed, Apr 25, 2018 at 09:41:51AM +0200, Thierry Reding wrote:
> > > API from the iommu/dma-mapping code. Drivers have no business poking
> > > into these details.
> >
> > The interfaces that the above patch uses are all EXPORT_SYMBOL_GPL,
> > which is rather misleading if they are not meant to be used by drivers
> > directly.
> The only reason the DMA ops are exported is because get_arch_dma_ops
> references (or in case of the coherent ones used to reference). We
> don't let drivers assign random dma ops.
> >
> > > Thierry, please resend this with at least the iommu list and
> > > linux-arm-kernel in Cc to have a proper discussion on the right API.
> >
> > I'm certainly open to help with finding a correct solution, but the
> > patch above was purposefully terse because this is something that I
> > hope we can get backported to v4.16 to unbreak Nouveau. Coordinating
> > such a backport between ARM and DRM trees does not sound like something
> > that would help getting this fixed in v4.16.
> Coordinating the backport of a trivial helper in the arm tree is not
> the end of the world. Really, this cowboy attitude is a good reason
> why graphics folks have such a bad rep. You keep poking into random
> kernel internals, don't talk to anoyone and then complain if people
> are upset. This shouldn't be surprising.

Not really agreeing on the cowboy thing. The fundamental problem is that
the dma api provides abstraction that seriously gets in the way of writing
a gpu driver. Some examples:

- We never want bounce buffers, ever. dma_map_sg gives us that, so there's
hacks to fall back to a cache of pages allocated using
dma_alloc_coherent if you build a kernel with bounce buffers.

- dma api hides the cache flushing requirements from us. GPUs love
non-snooped access, and worse give userspace control over that. We want
a strict separation between mapping stuff and flushing stuff. With the
IOMMU api we mostly have the former, but for the later arch maintainers
regularly tells they won't allow that. So we have drm_clflush.c.

- dma api hides how/where memory is allocated. Kinda similar problem,
except now for CMA or address limits. So either we roll our own
allocators and then dma_map_sg (and pray it doesn't bounce buffer), or
we use dma_alloc_coherent and then grab the sgt to get at the CMA
allocations because that's the only way. Which sucks, because we can't
directly tell CMA how to back off if there's some way to make CMA memory
available through other means (gpus love to hog all of memory, so we
have shrinkers and everything).

For display drivers the dma api mostly works, until you start sharing
buffers with other devices.

So from our perspective it looks fairly often that core folks just don't
want to support gpu use-cases, so we play a bit more cowboy and get things
done some other way. Since this has been going on for years now we often
don't even bother to start a discussion first, since it tended to go
nowhere useful.

> > Granted, this issue could've been caught with a little more testing, but
> > in retrospect I think it would've been a lot better if ARM_DMA_USE_IOMMU
> > was just enabled unconditionally if it has side-effects that platforms
> > don't opt in to but have to explicitly opt out of.
> Agreed on that count. Please send a patch.

Daniel Vetter
Software Engineer, Intel Corporation