Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

From: Koenig, Christian
Date: Wed Jan 16 2019 - 02:28:28 EST

Am 16.01.19 um 08:09 schrieb Thomas Hellstrom:
> On Tue, 2019-01-15 at 21:58 +0100, hch@xxxxxx wrote:
>> On Tue, Jan 15, 2019 at 07:13:11PM +0000, Koenig, Christian wrote:
>>> Thomas is correct that the interface you propose here doesn't work
>>> at
>>> all for GPUs.
>>> The kernel driver is not informed of flush/sync, but rather just
>>> setups
>>> coherent mappings between system memory and devices.
>>> In other words you have an array of struct pages and need to map
>>> that to
>>> a specific device and so create dma_addresses for the mappings.
>> If you want a coherent mapping you need to use dma_alloc_coherent
>> and dma_mmap_coherent and you are done, that is not the problem.
>> That actually is one of the vmgfx modes, so I don't understand what
>> problem we are trying to solve if you don't actually want a non-
>> coherent mapping.
> For vmwgfx, not making dma_alloc_coherent default has a couple of
> reasons:
> 1) Memory is associated with a struct device. It has not been clear
> that it is exportable to other devices.
> 2) There seems to be restrictions in the system pages allowable. GPUs
> generally prefer highmem pages but dma_alloc_coherent returns a virtual
> address implying GFP_KERNEL? While not used by vmwgfx, TTM typically
> prefers HIGHMEM pages to facilitate caching mode switching without
> having to touch the kernel map.
> 3) Historically we had APIs to allow coherent access to user-space
> defined pages. That has gone away not but the infrastructure was built
> around it.
> dma_mmap_coherent isn't use because as the data moves between system
> memory, swap and VRAM, PTEs of user-space mappings are adjusted
> accordingly, meaning user-space doesn't have to unmap when an operation
> is initiated that might mean the data is moved.

To summarize once more: We have an array of struct pages and want to
coherently map that to a device.

If that is not possible because of whatever reason we want to get an
error code or even not load the driver from the beginning.

>> Although last time I had that discussion with Daniel Vetter
>> I was under the impressions that GPUs really wanted non-coherent
>> mappings.
> Intel historically has done things a bit differently. And it's also
> possible that embedded platforms and ARM prefer this mode of operation,
> but I haven't caught up on that discussion.
>> But if you want a coherent mapping you can't go to a struct page,
>> because on many systems you can't just map arbitrary memory as
>> uncachable. It might either come from very special limited pools,
>> or might need other magic applied to it so that it is not visible
>> in the normal direct mapping, or at least not access through it.
> The TTM subsystem has been relied on to provide coherent memory with
> the option to switch caching mode of pages. But only on selected and
> well tested platforms. On other platforms we simply do not load, and
> that's fine for now.
> But as mentioned multiple times, to make GPU drivers more compliant,
> we'd really want that
> bool dma_streaming_is_coherent(const struct device *)
> API to help us decide when to load or not.

Yes, please.


> Thanks,
> Thomas