Re: [PATCH 1/2] drm: add cache support for arm64
From: Daniel Vetter
Date: Thu Aug 08 2019 - 07:58:15 EST
On Thu, Aug 08, 2019 at 11:55:06AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 07, 2019 at 10:48:56AM +0200, Daniel Vetter wrote:
> > > other drm drivers how do they guarantee addressability without an
> > > iommu?)
> >
> > We use shmem to get at swappable pages. We generally just assume that
> > the gpu can get at those pages, but things fall apart in fun ways:
> > - some setups somehow inject bounce buffers. Some drivers just give
> > up, others try to allocate a pool of pages with dma_alloc_coherent.
> > - some devices are misdesigned and can't access as much as the cpu. We
> > allocate using GFP_DMA32 to fix that.
>
> Well, for shmem you can't really call allocators directly, right?
We can pass gfp flags to shmem_read_mapping_page_gfp, which is just about
enough for the 2 cases on intel platforms where the gpu can only access
4G, but the cpu has way more.
> One thing I have in my pipeline is a dma_alloc_pages API that allocates
> pages that are guaranteed to be addressably by the device or otherwise
> fail. But that doesn't really help with the shmem fs.
Yeah, the other drivers where the shmem gfp trick doesn't work copy
back&forth between the dma-able pages and the shmem swappable pages as
needed in their shrinker/allocation code. I guess ideal would be if we
could fuse the custom allocator somehow directly into shmem.
Otoh once you start thrashing beyond system memory for gfx workloads it's
pretty hopeless anyway, and speed doesn't really matter anymore.
> > Also modern gpu apis pretty much assume you can malloc() and then use
> > that directly with the gpu.
>
> Which is fine as long as the GPU itself supports full 64-bit addressing
> (or always sits behind an iommu), and the platform doesn't impose
> addressing limit, which unfortunately some that are shipped right now
> still do :(
Yes, the userspace api people in khronos are occasionally a bit optimistic
:-)
> But userspace malloc really means dma_map_* anyway, so not really
> relevant for memory allocations.
It does tie in, since we'll want a dma_map which fails if a direct mapping
isn't possible. It also helps the driver code a lot if we could use the
same low-level flushing functions between our own memory (whatever that
is) and anon pages from malloc. And in all the cases if it's not possible,
we want a failure, not elaborate attempts at hiding the differences
between all possible architectures out there.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch