Re: Interacting with coherent memory on external devices

From: Christoph Lameter
Date: Thu Apr 23 2015 - 10:25:25 EST

Next message: Alex BennÃe: "Re: [PATCH v2 06/10] KVM: arm64: guest debug, add SW break point support"
Previous message: gregkh@xxxxxxxxxxxxxxxxxxx: "Re: [PATCH] Staging: rtl8712: Removed unused variable"
In reply to: Benjamin Herrenschmidt: "Re: Interacting with coherent memory on external devices"
Next in thread: Austin S Hemmelgarn: "Re: Interacting with coherent memory on external devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 23 Apr 2015, Benjamin Herrenschmidt wrote:

> They are via MMIO space. The big differences here are that via CAPI the
> memory can be fully cachable and thus have the same characteristics as
> normal memory from the processor point of view, and the device shares
> the MMU with the host.
>
> Practically what that means is that the device memory *is* just some
> normal system memory with a larger distance. The NUMA model is an
> excellent representation of it.

I sure wish you would be working on using these features to increase
performance and the speed of communication to devices.

Device memory is inherently different from main memory (otherwise the
device would be using main memory) and thus not really NUMA. NUMA at least
assumes that the basic characteristics of memory are the same while just
the access speeds vary. GPU memory has very different performance
characteristics and the various assumptions on memory that the kernel
makes for the regular processors may not hold anymore.

> For my use cases the advantage of CAPI lies in the reduction of latency
> > for coprocessor communication. I hope that CAPI will allow fast cache to
> > cache transactions between a coprocessor and the main one. This is
> > improving the ability to exchange data rapidly between a application code
> > and some piece of hardware (NIC, GPU, custom hardware etc etc)
> >
> > Fundamentally this is currently an design issue since CAPI is running on
> > top of PCI-E and PCI-E transactions establish a minimum latency that
> > cannot be avoided. So its hard to see how CAPI can improve the situation.
>
> It's on top of the lower layers of PCIe yes, I don't know the exact
> latency numbers. It does enable the device to own cache lines though and
> vice versa.

Could you come up with a way to allow faster device communication through
improving on the PCI-E cacheline handoff via CAPI? That would be something
useful that I expected from it. If the processor can transfer some word
faster into a CAPI device or get status faster then that is a valuable
thing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alex BennÃe: "Re: [PATCH v2 06/10] KVM: arm64: guest debug, add SW break point support"
Previous message: gregkh@xxxxxxxxxxxxxxxxxxx: "Re: [PATCH] Staging: rtl8712: Removed unused variable"
In reply to: Benjamin Herrenschmidt: "Re: Interacting with coherent memory on external devices"
Next in thread: Austin S Hemmelgarn: "Re: Interacting with coherent memory on external devices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]