Re: Enabling peer to peer device transactions for PCIe devices

From: Haggai Eran
Date: Mon Nov 28 2016 - 14:54:07 EST


On Mon, 2016-11-28 at 09:57 -0700, Jason Gunthorpe wrote:
+AD4- On Sun, Nov 27, 2016 at 04:02:16PM +-0200, Haggai Eran wrote:
+AD4- +AD4- I think blocking mmu notifiers against something that is basically
+AD4- +AD4- controlled by user-space can be problematic. This can block things
+AD4- +AD4- like
+AD4- +AD4- memory reclaim. If you have user-space access to the device's
+AD4- +AD4- queues,
+AD4- +AD4- user-space can block the mmu notifier forever.
+AD4- Right, I mentioned that..
Sorry, I must have missed it.

+AD4- +AD4- On PeerDirect, we have some kind of a middle-ground solution for
+AD4- +AD4- pinning
+AD4- +AD4- GPU memory. We create a non-ODP MR pointing to VRAM but rely on
+AD4- +AD4- user-space and the GPU not to migrate it. If they do, the MR gets
+AD4- +AD4- destroyed immediately.
+AD4- That sounds horrible. How can that possibly work? What if the MR is
+AD4- being used when the GPU decides to migrate?
Naturally this doesn't support migration. The GPU is expected to pin
these pages as long as the MR lives. The MR invalidation is done only as
a last resort to keep system correctness.

I think it is similar to how non-ODP MRs rely on user-space today to
keep them correct. If you do something like madvise(MADV+AF8-DONTNEED) on a
non-ODP MR's pages, you can still get yourself into a data corruption
situation (HCA sees one page and the process sees another for the same
virtual address). The pinning that we use only guarentees the HCA's page
won't be reused.

+AD4- I would not support that
+AD4- upstream without a lot more explanation..
+AD4-
+AD4- I know people don't like requiring new hardware, but in this case we
+AD4- really do need ODP hardware to get all the semantics people want..
+AD4-
+AD4- +AD4-
+AD4- +AD4- Another thing I think is that while HMM is good for user-space
+AD4- +AD4- applications, for kernel p2p use there is no need for that. Using
+AD4- From what I understand we are not really talking about kernel p2p,
+AD4- everything proposed so far is being mediated by a userspace VMA, so
+AD4- I'd focus on making that work.
Fair enough, although we will need both eventually, and I hope the
infrastructure can be shared to some degree.