Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

From: Jason Gunthorpe
Date: Wed Jan 16 2019 - 12:24:43 EST


On Wed, Jan 16, 2019 at 05:11:34PM +0100, hch@xxxxxx wrote:
> On Tue, Jan 15, 2019 at 02:25:01PM -0700, Jason Gunthorpe wrote:
> > RDMA needs something similar as well, in this case drivers take a
> > struct page * from get_user_pages() and need to have the DMA map fail
> > if the platform can't DMA map in a way that does not require any
> > additional DMA API calls to ensure coherence. (think Userspace RDMA
> > MR's)
>
> Any time you dma map pages you need to do further DMA API calls to
> ensure coherent, that is the way it is implemented. These calls
> just happen to be no-ops sometimes.
>
> > Today we just do the normal DMA map and when it randomly doesn't work
> > and corrupts data tell those people their platforms don't support RDMA
> > - it would be nice to have a safer API base solution..
>
> Now that all these drivers are consolidated in rdma-core you can fix
> the code to actually do the right thing. It isn't that userspace DMA
> coherent is any harder than in-kernel DMA coherenence. It just is
> that no one bothered to do it properly.

If I recall we actually can't.. libverbs presents an API to the user
that does not consider this possibility.

ie consider post_recv - the driver has no idea what user buffers
received data and can't possibly flush them transparently. The user
would have to call some special DMA syncing API, which we don't have.

It is the same reason the kernel API makes the ULP handle dma sync,
not the driver.

The fact is there is 0 industry interest in using RDMA on platforms
that can't do HW DMA cache coherency - the kernel syscalls required to
do the cache flushing on the IO path would just destroy performance to
the point of making RDMA pointless. Better to use netdev on those
platforms.

VFIO is in a similar boat. Their user API can't handle cache syncing
either, so they would use the same API too.

.. and the GPU-compute systems (ie OpenCL/CUDA) are like verbs, they
were never designed with incoherent DMA in mind, and don't have the
API design to support it.

The reality is that *all* the subsytems doing DMA kernel bypass are
ignoring the DMA mapping rules, I think we should support this better,
and just accept that user space DMA will not be using syncing. Block
access in cases when this is required, otherwise let it work as is
today.

Jason