Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag

From: Alex Deucher
Date: Wed Apr 25 2018 - 02:24:51 EST

Next message: Joey Pabalinas: "Re: [PATCH v4 0/2] tty/nozomi: general module cleanup"
Previous message: Daniel Vetter: "Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag"
In reply to: Daniel Vetter: "Re: [Linaro-mm-sig] noveau vs arm dma ops"
Next in thread: Christoph Hellwig: "Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Apr 25, 2018 at 2:13 AM, Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Wed, Apr 25, 2018 at 7:48 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> On Tue, Apr 24, 2018 at 09:32:20PM +0200, Daniel Vetter wrote:
>>> Out of curiosity, how much virtual flushing stuff is there still out
>>> there? At least in drm we've pretty much ignore this, and seem to be
>>> getting away without a huge uproar (at least from driver developers
>>> and users, core folks are less amused about that).
>>
>> As I've just been wading through the code, the following architectures
>> have non-coherent dma that flushes by virtual address for at least some
>> platforms:
>>
>> - arm [1], arm64, hexagon, nds32, nios2, parisc, sh, xtensa, mips,
>> powerpc
>>
>> These have non-coherent dma ops that flush by physical address:
>>
>> - arc, arm [1], c6x, m68k, microblaze, openrisc, sparc
>>
>> And these do not have non-coherent dma ops at all:
>>
>> - alpha, h8300, riscv, unicore32, x86
>>
>> [1] arm Ñeems to do both virtually and physically based ops, further
>> audit needed.
>>
>> Note that using virtual addresses in the cache flushing interface
>> doesn't mean that the cache actually is virtually indexed, but it at
>> least allows for the possibility.
>>
>>> > I think the most important thing about such a buffer object is that
>>> > it can distinguish the underlying mapping types. While
>>> > dma_alloc_coherent, dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT,
>>> > dma_map_page/dma_map_single/dma_map_sg and dma_map_resource all give
>>> > back a dma_addr_t they are in now way interchangable. And trying to
>>> > stuff them all into a structure like struct scatterlist that has
>>> > no indication what kind of mapping you are dealing with is just
>>> > asking for trouble.
>>>
>>> Well the idea was to have 1 interface to allow all drivers to share
>>> buffers with anything else, no matter how exactly they're allocated.
>>
>> Isn't that interface supposed to be dmabuf? Currently dma_map leaks
>> a scatterlist through the sg_table in dma_buf_map_attachment /
>> ->map_dma_buf, but looking at a few of the callers it seems like they
>> really do not even want a scatterlist to start with, but check that
>> is contains a physically contiguous range first. So kicking the
>> scatterlist our there will probably improve the interface in general.
>
> I think by number most drm drivers require contiguous memory (or an
> iommu that makes it look contiguous). But there's plenty others who
> have another set of pagetables on the gpu itself and can
> scatter-gather. Usually it's the former for display/video blocks, and
> the latter for rendering.
>
>>> dma-buf has all the functions for flushing, so you can have coherent
>>> mappings, non-coherent mappings and pretty much anything else. Or well
>>> could, because in practice people hack up layering violations until it
>>> works for the 2-3 drivers they care about. On top of that there's the
>>> small issue that x86 insists that dma is coherent (and that's true for
>>> most devices, including v4l drivers you might want to share stuff
>>> with), and gpus really, really really do want to make almost
>>> everything incoherent.
>>
>> How do discrete GPUs manage to be incoherent when attached over PCIe?
>
> It has a non-coherent transaction mode (which the chipset can opt to
> not implement and still flush), to make sure the AGP horror show
> doesn't happen again and GPU folks are happy with PCIe. That's at
> least my understanding from digging around in amd the last time we had
> coherency issues between intel and amd gpus. GPUs have some bits
> somewhere (in the pagetables, or in the buffer object description
> table created by userspace) to control that stuff.

Right. We have a bit in the GPU page table entries that determines
whether we snoop the CPU's cache or not.

Alex

>
> For anything on the SoC it's presented as pci device, but that's
> extremely fake, and we can definitely do non-snooped transactions on
> drm/i915. Again, controlled by a mix of pagetables and
> userspace-provided buffer object description tables.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Next message: Joey Pabalinas: "Re: [PATCH v4 0/2] tty/nozomi: general module cleanup"
Previous message: Daniel Vetter: "Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag"
In reply to: Daniel Vetter: "Re: [Linaro-mm-sig] noveau vs arm dma ops"
Next in thread: Christoph Hellwig: "Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]