Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

From: Thomas Hellstrom
Date: Wed Jan 16 2019 - 02:10:28 EST

On Tue, 2019-01-15 at 21:58 +0100, hch@xxxxxx wrote:
> On Tue, Jan 15, 2019 at 07:13:11PM +0000, Koenig, Christian wrote:
> > Thomas is correct that the interface you propose here doesn't work
> > at
> > all for GPUs.
> >
> > The kernel driver is not informed of flush/sync, but rather just
> > setups
> > coherent mappings between system memory and devices.
> >
> > In other words you have an array of struct pages and need to map
> > that to
> > a specific device and so create dma_addresses for the mappings.
> If you want a coherent mapping you need to use dma_alloc_coherent
> and dma_mmap_coherent and you are done, that is not the problem.
> That actually is one of the vmgfx modes, so I don't understand what
> problem we are trying to solve if you don't actually want a non-
> coherent mapping.

For vmwgfx, not making dma_alloc_coherent default has a couple of
1) Memory is associated with a struct device. It has not been clear
that it is exportable to other devices.
2) There seems to be restrictions in the system pages allowable. GPUs
generally prefer highmem pages but dma_alloc_coherent returns a virtual
address implying GFP_KERNEL? While not used by vmwgfx, TTM typically
prefers HIGHMEM pages to facilitate caching mode switching without
having to touch the kernel map.
3) Historically we had APIs to allow coherent access to user-space
defined pages. That has gone away not but the infrastructure was built
around it.

dma_mmap_coherent isn't use because as the data moves between system
memory, swap and VRAM, PTEs of user-space mappings are adjusted
accordingly, meaning user-space doesn't have to unmap when an operation
is initiated that might mean the data is moved.

> Although last time I had that discussion with Daniel Vetter
> I was under the impressions that GPUs really wanted non-coherent
> mappings.

Intel historically has done things a bit differently. And it's also
possible that embedded platforms and ARM prefer this mode of operation,
but I haven't caught up on that discussion.

> But if you want a coherent mapping you can't go to a struct page,
> because on many systems you can't just map arbitrary memory as
> uncachable. It might either come from very special limited pools,
> or might need other magic applied to it so that it is not visible
> in the normal direct mapping, or at least not access through it.

The TTM subsystem has been relied on to provide coherent memory with
the option to switch caching mode of pages. But only on selected and
well tested platforms. On other platforms we simply do not load, and
that's fine for now.

But as mentioned multiple times, to make GPU drivers more compliant,
we'd really want that

bool dma_streaming_is_coherent(const struct device *)

API to help us decide when to load or not.