Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

From: ira.weiny
Date: Fri Apr 10 2015 - 02:16:40 EST

First off there are 2 separate issues here:

1) We need to communicate if a port supports or requires various management
support from the ib_mad, ib_cm, and/or ib_sa modules.

2) We need to communicate how a addresses are formated and resolved for a
particular port

In general I don't think we need to remove all uses of the Transport
or Link Layer.

Although we may be able to remove most of the transport uses.

On Wed, Apr 08, 2015 at 02:10:15PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:
> > To straighten all this out, lets break management out into the two
> > distinct types:
> >
> > rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
> > multicast. The proper test for this with my bitmap above is a simple
> > transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.
> > rdma_port_conn_mgmt() <- connection management, which we currently
> > support everything except USNIC (correct Sean?), so a test would be
> > something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
> > out into two subgroups, IB style and iWARP stype connection management
> > (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
> > above bitmap, since I didn't give IBOE its own transport type, these
> > subgroups still boil down to the simple tests transport & iWARP and
> > transport & IB like they do today.
> There is a lot more variation here than just these two tests, and those
> two tests won't scale to include OPA.
> SMI Y N Y (though the OPA smi looked a bit different)

Yes OPA is different but it is based on the class version of the individual
MADs not any particular device/port support.



IB SMP Y N Y (OPA supports the IB NodeInfo query)


How is this different from the SMI?

> SA Y N Y
> PM Y Y Y (? guessing for OPA)

> CM Y Y Y
> GMP needs GRH N Y N
> It may be unrealistic, but I was hoping we could largely scrub the
> opaque 'is spec iWARP, is spec ROCEE' kinds of tests because they
> don't tell anyone what it is the code cares about.

I somewhat agree except for things like addressing. In the area of addressing
I think we are likely to need to define something like "cap_addr_ib",
"cap_addr_iboe", "cap_addr_iwarp". See below for more details.

> Maybe what is needed is a more precise language for the functions:
> > > + * cap_ib_mad - Check if the port of device has the capability
> > > Infiniband
> > > + * Management Datagrams.
> As used this seems to mean:
> True if the port can do IB/OPA SMP, or GMP management packets on QP0 or
> QP1. (Y Y Y) ie: Do we need the MAD layer at all.
> ib_smi seems to be true if QP0 is supported (Y N Y)
> Maybe the above set would make a lot more sense as:
> cap_ib_qp0
> cap_ib_qp1
> cap_opa_qp0

I disagree.

All we need right now is is cap_qp0. All devices currently support QP1.

Then after all this is settled I can add:

OPA MAD Space N N Y Port is OPA MAD space.

> ib_cm seems to mean that the CM protocol from the IBA is used on the
> port (Y Y Y)


> ib_sa means the IBA SA protocol is supported (Y Y Y)

I think this should be (Y N Y)

IBoE has no SA. The IBoE code "fabricates" a Path Record it does not need to
interact with the SA.

> ib_mcast true if the IBA SA protocol is used for multicast GIDs (Y N
> Y)

Given the above why can't we just have the "ib_sa" flag?

> ipoib means the port supports the ipoib protocol (Y N ?)

OPA does IPoIB so... (Y N Y)

However, I think checking the link layer is more appropriate here. It does not
make sense to do IP over IB over Eth. Even though the IBoE can do the "IB"

Making flags in the driver to indicate which ULPs they support is a _bad_

FWIW: I don't consider the MAD and SA (multicast) modules ULPs. Rather they
are helper modules which are built to share code amongst the drivers to process
things on behalf of the drivers themselves. As such advertising a need for or
particular support within those modules make sense.

So although strictly speaking we could do IPoIBoEth, I think having IPoIB check
the LL and limiting itself to ports which are IB LL is appropriate.

> This seem reasonable and understandable, even if they are currently a
> bit duplicating.
> > Patch 9/17:
> >
> > Most of the other comments on this patch stand as they are. I would add
> > the test:
> >
> > rdma_port_separate_read_sge(dev, port)
> > {
> > return dev->port[port]->transport & RDMA_SEPERATE_READ_SGE;
> > }
> >
> > and add the helper function:
> >
> > rdma_port_get_read_sge(dev, port)
> > {
> > if (rdma_transport_is_iwarp)
> > return 1;
> > return dev->port[port]->max_sge;
> > }
> Hum, that is nice, but it doesn't quite fit with how the ULP needs to
> work. The max limit when creating a WR is the value passed into the
> qp_cap, not the device maximum limit.
> To do this properly we need to extend the qp_cap, and that is just too
> big a change. A one bit iWarp quirk is OK for now.

I agree. This is the one place we probably want to just keep the "Transport"

> > As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm
> I actually really prefer cap_mandatory_grh - that is what is going on
> here. ie based on that name (as a reviewer) I'd expect to see the mad
> layer check that the mandatory GRH is always present, or blow up.

While GRH mandatory (for the GMP) is what this is. The function
ib_init_ah_from_path generically is really handling an "IBoE address" to send
to and therefore we need to force the GRH in the AH.

There is a whole slew of LL and Transport checks which are used to
format/resolve addresses.

Functions which need to know that the address format is "IBoE"

-- cma.c: cma_acquire_dev
-- cma.c: cma_modify_qp_rtr
-- sa_query.c: ib_init_ah_from_path
-- verbs.c: ib_resolve_eth_l2_attrs

What about a check rdma_port_req_iboe_addr()?

Functions where AF_IB is checked against LL because it does not make sense to
use AF_IB on anything but an IB LL

-- cma.c: cma_listen_on_dev
-- cma.c: cma_bind_loopback

These checks should be directly against the LL

Functions where the ARP type is checked against the link layer.

-- cma.c: cma_acquire_dev
-- cma.c: cma_bind_loopback

These checks should be directly against the LL

> Some of the other checks in this file revolve around pkey, I'm not
> sure what rocee does there? cap_pkey_supported ?

It seems IBoE just hardcodes the pkey to 0xffff. I don't see it used anywhere.

Function where port requires "real" PKey

-- cma.c: cma_ib_init_qp_attr

Check rdma_port_req_pkey()?

Over all for the addressing choices:

The "Transport" (or protocol, or whatever) is Verbs. The Layer below Verbs
(OPA/IB/Eth/TCP) defines how addressing, route, and connection information is
generated, communicated, and used.

As Jason and Doug have been saying sometimes we want to know when that requires
SA interaction or the use of the CM protocol (or neither). Other times we just
need to know what the Address format or Link Layer is.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at