Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

From: Dan Williams
Date: Tue Apr 18 2017 - 18:51:33 EST


On Tue, Apr 18, 2017 at 3:42 PM, Jason Gunthorpe
<jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Apr 18, 2017 at 03:28:17PM -0700, Dan Williams wrote:
>
>> Unlike the pci bus address offset case which I think is fundamental to
>> support since shipping archs do this toda
>
> But we can support this by modifying those arch's unique dma_ops
> directly.
>
> Eg as I explained, my p2p_same_segment_map_page() helper concept would
> do the offset adjustment for same-segement DMA.
>
> If PPC calls that in their IOMMU drivers then they will have proper
> support for this basic p2p, and the right framework to move on to more
> advanced cases of p2p.
>
> This really seems like much less trouble than trying to wrapper all
> the arch's dma ops, and doesn't have the wonky restrictions.

I don't think the root bus iommu drivers have any business knowing or
caring about dma happening between devices lower in the hierarchy.

>> I think it is ok to say p2p is restricted to a single sgl that gets
>> to talk to host memory or a single device.
>
> RDMA and GPU would be sad with this restriction...
>
>> That said, what's wrong with a p2p aware map_sg implementation
>> calling up to the host memory map_sg implementation on a per sgl
>> basis?
>
> Setting up the iommu is fairly expensive, so getting rid of the
> batching would kill performance..

When we're crossing device and host memory boundaries how much
batching is possible? As far as I can see you'll always be splitting
the sgl on these dma mapping boundaries.