Re: Enabling peer to peer device transactions for PCIe devices

From: Dan Williams
Date: Tue Nov 22 2016 - 15:24:39 EST

On Tue, Nov 22, 2016 at 12:10 PM, Daniel Vetter <daniel@xxxxxxxx> wrote:
> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>> <serguei.sagalovitch@xxxxxxx> wrote:
>>> I personally like "device-DAX" idea but my concerns are:
>>> - How well it will co-exists with the DRM infrastructure / implementations
>>> in part dealing with CPU pointers?
>> Inside the kernel a device-DAX range is "just memory" in the sense
>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>> not migratable. To be honest I do not know how well that co-exists
>> with drm infrastructure.
>>> - How well we will be able to handle case when we need to "move"/"evict"
>>> memory/data to the new location so CPU pointer should point to the new
>>> physical location/address
>>> (and may be not in PCI device memory at all)?
>> So, device-DAX deliberately avoids support for in-kernel migration or
>> overcommit. Those cases are left to the core mm or drm. The device-dax
>> interface is for cases where all that is needed is a direct-mapping to
>> a statically-allocated physical-address range be it persistent memory
>> or some other special reserved memory range.
> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> pull off) I think we want all the magic in core mm, i.e. migration and
> overcommit. At least that seems to be the very strong drive in all
> general-purpose gpu abstractions and implementations, where memory is
> allocated with malloc, and then mapped/moved into vram/gpu address
> space through some magic, but still visible on both the cpu and gpu
> side in some form. Special device to allocate memory, and not being
> able to migrate stuff around sound like misfeatures from that pov.

Agreed. For general purpose P2P use cases where all you want is
direct-I/O to a memory range that happens to be on a PCIe device then
I think a special device fits the bill. For gpu P2P use cases that
already have migration/overcommit expectations then it is not a good