Re: [Linaro-mm-sig] [PATCH 4/8] dma-buf: add peer2peer flag

From: Dan Williams
Date: Wed Apr 25 2018 - 14:38:12 EST


On Wed, Apr 25, 2018 at 10:44 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
> On Wed, Apr 25, 2018 at 2:41 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> On Wed, Apr 25, 2018 at 02:24:36AM -0400, Alex Deucher wrote:
>>> > It has a non-coherent transaction mode (which the chipset can opt to
>>> > not implement and still flush), to make sure the AGP horror show
>>> > doesn't happen again and GPU folks are happy with PCIe. That's at
>>> > least my understanding from digging around in amd the last time we had
>>> > coherency issues between intel and amd gpus. GPUs have some bits
>>> > somewhere (in the pagetables, or in the buffer object description
>>> > table created by userspace) to control that stuff.
>>>
>>> Right. We have a bit in the GPU page table entries that determines
>>> whether we snoop the CPU's cache or not.
>>
>> I can see how that works with the GPU on the same SOC or SOC set as the
>> CPU. But how is that going to work for a GPU that is a plain old PCIe
>> card? The cache snooping in that case is happening in the PCIe root
>> complex.
>
> I'm not a pci expert, but as far as I know, the device sends either a
> snooped or non-snooped transaction on the bus. I think the
> transaction descriptor supports a no snoop attribute. Our GPUs have
> supported this feature for probably 20 years if not more, going back
> to PCI. Using non-snooped transactions have lower latency and faster
> throughput compared to snooped transactions.

Right, 'no snoop' and 'relaxed ordering' have been part of the PCI
spec since forever. With a plain old PCI-E card the root complex
indeed arranges for caches to be snooped. Although it's not always
faster depending on the platform. 'No snoop' traffic may be relegated
to less bus resources relative to the much more common snooped
traffic.