RE: [[PATCH v1] 05/37] [CIFS] SMBD: Implement API for upper layer to create SMBD transport and establish RDMA connection

From: Long Li
Date: Tue Aug 29 2017 - 22:36:03 EST




> -----Original Message-----
> From: Tom Talpey
> Sent: Monday, August 14, 2017 12:55 PM
> To: Long Li <longli@xxxxxxxxxxxxx>; Steve French <sfrench@xxxxxxxxx>;
> linux-cifs@xxxxxxxxxxxxxxx; samba-technical@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
> Subject: RE: [[PATCH v1] 05/37] [CIFS] SMBD: Implement API for upper layer
> to create SMBD transport and establish RDMA connection
>
> > -----Original Message-----
> > From: linux-cifs-owner@xxxxxxxxxxxxxxx [mailto:linux-cifs-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Long Li
> > Sent: Wednesday, August 2, 2017 4:10 PM
> > To: Steve French <sfrench@xxxxxxxxx>; linux-cifs@xxxxxxxxxxxxxxx;
> > samba- technical@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > Cc: Long Li <longli@xxxxxxxxxxxxx>
> > Subject: [[PATCH v1] 05/37] [CIFS] SMBD: Implement API for upper layer
> > to create SMBD transport and establish RDMA connection
> >
> > From: Long Li <longli@xxxxxxxxxxxxx>
> >
> > Implement the code for connecting to SMBD server. The client and
> > server are connected using RC Queue Pair over RDMA API, which
> > suppports Infiniband, RoCE and iWARP. Upper layer code can call
> > cifs_create_rdma_session to establish a SMBD RDMA connection.
> >
> > +/* Upcall from RDMA CM */
> > +static int cifs_rdma_conn_upcall(
> > + struct rdma_cm_id *id, struct rdma_cm_event *event) {
> > + struct cifs_rdma_info *info = id->context;
> > +
> > + log_rdma_event("event=%d status=%d\n", event->event,
> > + event->status);
> > +
> > + switch (event->event) {
> > + case RDMA_CM_EVENT_ADDR_RESOLVED:
> > + case RDMA_CM_EVENT_ROUTE_RESOLVED:
> > + info->ri_rc = 0;
> > + complete(&info->ri_done);
> > + break;
> > +
> > + case RDMA_CM_EVENT_ADDR_ERROR:
> > + info->ri_rc = -EHOSTUNREACH;
> > + complete(&info->ri_done);
> > + break;
> > +
> > + case RDMA_CM_EVENT_ROUTE_ERROR:
> > + info->ri_rc = -ENETUNREACH;
> > + complete(&info->ri_done);
> > + break;
> > +
> > + case RDMA_CM_EVENT_ESTABLISHED:
> > + case RDMA_CM_EVENT_CONNECT_ERROR:
> > + case RDMA_CM_EVENT_UNREACHABLE:
> > + case RDMA_CM_EVENT_REJECTED:
> > + case RDMA_CM_EVENT_DEVICE_REMOVAL:
> > + log_rdma_event("connected event=%d\n", event->event);
> > + info->connect_state = event->event;
> > + break;
> > +
> > + case RDMA_CM_EVENT_DISCONNECTED:
> > + break;
> > +
> > + default:
> > + break;
> > + }
> > +
> > + return 0;
> > +}
>
> This code looks a lot like the connection stuff in the NFS/RDMA RPC transport.
> Does your code have the same needs? If so, you might consider moving this
> to a common RDMA handler.

>
> > +/* Upcall from RDMA QP */
> > +static void
> > +cifs_rdma_qp_async_error_upcall(struct ib_event *event, void
> > +*context) {
> > + struct cifs_rdma_info *info = context;
> > + log_rdma_event("%s on device %s info %p\n",
> > + ib_event_msg(event->event), event->device->name,
> > +info);
> > +
> > + switch (event->event)
> > + {
> > + case IB_EVENT_CQ_ERR:
> > + case IB_EVENT_QP_FATAL:
> > + case IB_EVENT_QP_REQ_ERR:
> > + case IB_EVENT_QP_ACCESS_ERR:
> > +
> > + default:
> > + break;
> > + }
> > +}
>
> Ditto. But, what's up with the empty switch(event->event) processing?

I have changed to code to disconnect RDMA connection on QP errors.


>
> > +static struct rdma_cm_id* cifs_rdma_create_id(
> > + struct cifs_rdma_info *info, struct sockaddr *dstaddr)
> > +{
> ...
> > + log_rdma_event("connecting to IP %pI4 port %d\n",
> > + &addr_in->sin_addr, ntohs(addr_in->sin_port));
> >... and then...
> > + if (dstaddr->sa_family == AF_INET6)
> > + sport = &((struct sockaddr_in6 *)dstaddr)->sin6_port;
> > + else
> > + sport = &((struct sockaddr_in *)dstaddr)->sin_port;
> > +
> > + *sport = htons(445);
> ...and
> > +out:
> > + // try port number 5445 if port 445 doesn't work
> > + if (*sport == htons(445)) {
> > + *sport = htons(5445);
> > + goto try_again;
> > + }
>
> Suggest rearranging the log_rdma_event() call to reflect reality.
>
> The IANA-assigned port for SMB Direct is 5445, and port 445 will be listening
> on TCP. Should you really be probing that port before 5445?
> I suggest not doing so unconditionally.

This part is reworked in V3 to behave as you suggested.

>
> > +struct cifs_rdma_info* cifs_create_rdma_session(
> > + struct TCP_Server_Info *server, struct sockaddr *dstaddr) {
> > ...
> > + int max_pending = receive_credit_max + send_credit_target;
> >...
> > + if (max_pending > info->id->device->attrs.max_cqe ||
> > + max_pending > info->id->device->attrs.max_qp_wr) {
> > + log_rdma_event("consider lowering receive_credit_max and "
> > + "send_credit_target. Possible CQE overrun, device "
> > + "reporting max_cpe %d max_qp_wr %d\n",
> > + info->id->device->attrs.max_cqe,
> > + info->id->device->attrs.max_qp_wr);
> > + goto out2;
> > + }
>
> I don't understand this. Why are you directing both Receive and Send
> completions to the same CQ, won't that make it very hard to manage
> completions and their interrupts? Also, what device(s) have you seen trigger
> this log? CQ's are generally allowed to be quite large.

I have moved them to separate completion queues in V3.

>
> > + conn_param.responder_resources = 32;
> > + if (info->id->device->attrs.max_qp_rd_atom < 32)
> > + conn_param.responder_resources =
> > + info->id->device->attrs.max_qp_rd_atom;
> > + conn_param.retry_count = 6;
> > + conn_param.rnr_retry_count = 6;
> > + conn_param.flow_control = 0;
>
> These choices warrant explanation. 32 responder resources is on the large
> side for most datacenter networks. And because of SMB Direct's credits, why
> is RNR_retry not simply zero?
>

Those have been fixed. On Mellanox ConnectX3, the attrs.max_qp_rd_atom is 16. So I choose a default value 32 to work with hardware that can potentially support more responder resources.