Re: [RFC PATCH 20/28] IB/core: Introduce API for initializing a RW ctx from a DMA address

From: Logan Gunthorpe
Date: Thu Jun 20 2019 - 14:24:46 EST




On 2019-06-20 11:11 a.m., Jason Gunthorpe wrote:
> On Thu, Jun 20, 2019 at 10:59:44AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-06-20 10:49 a.m., Jason Gunthorpe wrote:
>>> On Thu, Jun 20, 2019 at 10:12:32AM -0600, Logan Gunthorpe wrote:
>>>> Introduce rdma_rw_ctx_dma_init() and rdma_rw_ctx_dma_destroy() which
>>>> peform the same operation as rdma_rw_ctx_init() and
>>>> rdma_rw_ctx_destroy() respectively except they operate on a DMA
>>>> address and length instead of an SGL.
>>>>
>>>> This will be used for struct page-less P2PDMA, but there's also
>>>> been opinions expressed to migrate away from SGLs and struct
>>>> pages in the RDMA APIs and this will likely fit with that
>>>> effort.
>>>>
>>>> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
>>>> drivers/infiniband/core/rw.c | 74 ++++++++++++++++++++++++++++++------
>>>> include/rdma/rw.h | 6 +++
>>>> 2 files changed, 69 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
>>>> index 32ca8429eaae..cefa6b930bc8 100644
>>>> +++ b/drivers/infiniband/core/rw.c
>>>> @@ -319,6 +319,39 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
>>>> }
>>>> EXPORT_SYMBOL(rdma_rw_ctx_init);
>>>>
>>>> +/**
>>>> + * rdma_rw_ctx_dma_init - initialize a RDMA READ/WRITE context from a
>>>> + * DMA address instead of SGL
>>>> + * @ctx: context to initialize
>>>> + * @qp: queue pair to operate on
>>>> + * @port_num: port num to which the connection is bound
>>>> + * @addr: DMA address to READ/WRITE from/to
>>>> + * @len: length of memory to operate on
>>>> + * @remote_addr:remote address to read/write (relative to @rkey)
>>>> + * @rkey: remote key to operate on
>>>> + * @dir: %DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
>>>> + *
>>>> + * Returns the number of WQEs that will be needed on the workqueue if
>>>> + * successful, or a negative error code.
>>>> + */
>>>> +int rdma_rw_ctx_dma_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
>>>> + u8 port_num, dma_addr_t addr, u32 len, u64 remote_addr,
>>>> + u32 rkey, enum dma_data_direction dir)
>>>
>>> Why not keep the same basic signature here but replace the scatterlist
>>> with the dma vec ?
>>
>> Could do. At the moment, I had no need for dma_vec in this interface.
>
> I think that is because you only did nvme not srp/iser :)

I'm not sure that's true at least for the P2P case. With P2P we are able
to allocate one continuous region of memory for each transaction. It
would be quite weird to allocate multiple regions for a single transaction.

>>>> +{
>>>> + struct scatterlist sg;
>>>> +
>>>> + sg_dma_address(&sg) = addr;
>>>> + sg_dma_len(&sg) = len;
>>>
>>> This needs to fail if the driver is one of the few that require
>>> struct page to work..
>>
>> Yes, right. Currently P2PDMA checks for the use of dma_virt_ops. And
>> that probably should also be done here. But is that sufficient? You're
>> probably right that it'll take an audit of the RDMA tree to sort that out.
>
> For this purpose I'd be fine if you added a flag to the struct
> ib_device_ops that is set on drivers that we know are OK.. We can make
> that list bigger over time.

Ok, that would mirror what we did for the block layer. I'll look at
doing something like that in the near future.

Thanks,

Logan