Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma

From: Logan Gunthorpe
Date: Tue Jan 29 2019 - 16:31:04 EST




On 2019-01-29 1:57 p.m., Jerome Glisse wrote:
> GPU driver must be in control and must be call to. Here there is 2 cases
> in this patchset and i should have instead posted 2 separate patchset as
> it seems that it is confusing things.
>
> For the HMM page, the physical address of the page ie the pfn does not
> correspond to anything ie there is nothing behind it. So the importing
> device has no idea how to get a valid physical address from an HMM page
> only the device driver exporting its memory with HMM device memory knows
> that.
>
>
> For the special vma ie mmap of a device file. GPU driver do manage their
> BAR ie the GPU have a page table that map BAR page to GPU memory and the
> driver _constantly_ update this page table, it is reflected by invalidating
> the CPU mapping. In fact most of the time the CPU mapping of GPU object are
> invalid they are valid only a small fraction of their lifetime. So you
> _must_ have some call to inform the exporting device driver that another
> device would like to map one of its vma. The exporting device can then
> try to avoid as much churn as possible for the importing device. But this
> has consequence and the exporting device driver must be allow to apply
> policy and make decission on wether or not it authorize the other device
> to peer map its memory. For GPU the userspace application have to call
> specific API that translate into specific ioctl which themself set flags
> on object (in the kernel struct tracking the user space object). The only
> way to allow program predictability is if the application can ask and know
> if it can peer export an object (ie is there enough BAR space left).

This all seems like it's an HMM problem and not related to mapping
BARs/"potential BARs" to userspace. If some code wants to DMA map HMM
pages, it calls an HMM function to map them. If HMM needs to consult
with the driver on aspects of how that's mapped, then that's between HMM
and the driver and not something I really care about. But making the
entire mapping stuff tied to userspace VMAs does not make sense to me.
What if somebody wants to map some HMM pages in the same way but from
kernel space and they therefore don't have a VMA?


>> I also figured there'd be a fault version of p2p_ioremap_device_memory()
>> for when you are mapping P2P memory and you want to assign the pages
>> lazily. Though, this can come later when someone wants to implement that.
>
> For GPU the BAR address space is manage page by page and thus you do not
> want to map a range of BAR addresses but you want to allow mapping of
> multiple page of BAR address that are not adjacent to each other nor
> ordered in anyway. But providing helper for simpler device does make sense.

Well, this has little do with the backing device but how the memory is
mapped into userspace. With p2p_ioremap_device_memory() the entire range
is mapped into the userspace VMA immediately during the call to mmap().
With p2p_fault_device_memory(), mmap() would not actually map anything
and a page in the VMA would be mapped only when userspace accesses it
(using fault()). It seems to me like GPUs would prefer the latter but if
HMM takes care of the mapping from userspace potential pages to actual
GPU pages through the BAR then that may not be true.

Logan