Re: Enabling peer to peer device transactions for PCIe devices

From: Jason Gunthorpe
Date: Mon Nov 28 2016 - 14:03:06 EST


On Mon, Nov 28, 2016 at 06:19:40PM +0000, Haggai Eran wrote:
> > > GPU memory. We create a non-ODP MR pointing to VRAM but rely on
> > > user-space and the GPU not to migrate it. If they do, the MR gets
> > > destroyed immediately.
> > That sounds horrible. How can that possibly work? What if the MR is
> > being used when the GPU decides to migrate?
> Naturally this doesn't support migration. The GPU is expected to pin
> these pages as long as the MR lives. The MR invalidation is done only as
> a last resort to keep system correctness.

That just forces applications to handle horrible unexpected
failures. If this sort of thing is needed for correctness then OOM
kill the offending process, don't corrupt its operation.

> I think it is similar to how non-ODP MRs rely on user-space today to
> keep them correct. If you do something like madvise(MADV_DONTNEED) on a
> non-ODP MR's pages, you can still get yourself into a data corruption
> situation (HCA sees one page and the process sees another for the same
> virtual address). The pinning that we use only guarentees the HCA's page
> won't be reused.

That is not really data corruption - the data still goes where it was
originally destined. That is an application violating the
requirements of a MR. An application cannot munmap/mremap a VMA
while a non ODP MR points to it and then keep using the MR.

That is totally different from a GPU driver wanthing to mess with
translation to physical pages.

> > From what I understand we are not really talking about kernel p2p,
> > everything proposed so far is being mediated by a userspace VMA, so
> > I'd focus on making that work.

> Fair enough, although we will need both eventually, and I hope the
> infrastructure can be shared to some degree.

What use case do you see for in kernel?

Presumably in-kernel could use a vmap or something and the same basic
flow?

Jason