Re: [PATCH 0/2 RESEND] IB/Verbs: Use helpers to refine the checking on transport and link layer

From: Doug Ledford
Date: Thu Mar 26 2015 - 12:28:27 EST


On Thu, 2015-03-26 at 17:04 +0100, Michael Wang wrote:
> Hi, Doug
>
> Thanks for the excellent comments :-)
>
> On 03/26/2015 03:09 PM, Doug Ledford wrote:
> > On Wed, 2015-03-25 at 16:09 +0100, Michael Wang wrote:
> >> [snip]
> >>
> > [snip]
> >
> > So, I would suggest that we fix things up thusly:
> >
> > enum transport {
> > TRANSPORT_IB=1,
> > TRANSPORT_IWARP=2,
> > TRANSPORT_ROCE=4,
> > TRANSPORT_OPA=8,
> > TRANSPORT_USNIC=10,
> > };
> >
> > #define HAS_SA(ibdev) ((ibdev)->transport & (TRANSPORT_IB|TRANSPORT_OPA))
> > #define HAS_JUMBO_SA(ibdev) ((ibdev)->transport & TRANSPORT_OPA))
> >
> > or possibly
> >
> > static bool ib_dev_has_sa(struct ibv_device *ibdev)
> > {
> > return ibdev->transport & (TRANSPORT_IB | TRANSPORT_OPA);
> > }
>
> The idea sounds interesting, and here my silly questions come :-P
>
> So are you suggesting that we add a new bitmask 'transport' into 'struct ib_device'
> in kernel, and setup it at very beginning?
>
> Few more questions here is:
> 1. when to setup? (maybe inside ib_register_device() before doing client->add() callback?)

I don't think "we" can set it up here. The driver's have to set it up.
After all, the mlx4 driver will have to decide for itself what the port
transport is and tell us, we can't tell it.

> 2. how to setup? (still infer from the transport and link layer like we currently do?)

Find each point in each driver where they currently set the link layer
and transport fields today, and replace that with setting the new
transport bitmask instead.

> 3. in case if a device has ports with different link layer type (please correct
> me if this will never happen), then only one bitmask may not be enough to
> present the transport of all the ports? (maybe create a bitmask per port?)

Correct, a bitmask per port. And we can remove the existing transport
and link layer elements of the struct and replace it with just the new
transport. Then, whenever we need to copy a struct to user space, we
have a helper that looks something like this:

static void inline ib_set_user_transport(struct ib_device *ibdev,
struct user_ibv_device *uibdev)
{
switch(ibdev->port[port]->transport) {
case TRANSPORT_IB:
case TRANSPORT_OPA:
uibdev->port[port]->link_layer = INFINIBAND;
uibdev->port[port]->transport = INFINIBAND;
break;
case TRANSPORT_IWARP:
uibdev->port[port]->link_layer = INFINIBAND;
uibdev->port[port]->transport = IWARP;
break;
case TRANSPORT_ROCE:
uibdev->port[port]->link_layer = ETHERNET;
uibdev->port[port]->transport = INFINIBAND;
break;
case TRANSPORT_USNIC:
uibdev->port[port]->link_layer = ETHERNET;
uibdev->port[port]->transport = <whatever USNIC uses today>;
break;
default:
pr_err(ibdev, "unknown transport type %x\n",
ibdev->port[port]->transport);
}
}

That preserves the user space ABI and all user programs keep working,
while we update to an internal representation that makes more sense for
how things have evolved.

> Regards,
> Michael Wang
>
> >
> > If we do this, then the only thing we have to fix up to preserve ABI
> > with user space is to make sure that any time we export an ibv_device
> > struct and any time we import the same, we convert from our new internal
> > representation to the old representation that user space expects. And
> > we also need to make a few changes in the sysfs code to display the
> > properties as things expect. But, that would allow us to fix up what I
> > see as a problem right now, which is that we hide the information we
> > need to know what sort of device we are working on in two different
> > fields: the transport and the link layer. Instead, just use one field
> > with enough variants that we can store all of the relevant information
> > we need in that one field. This has the benefit that any comparisons
> > that happen in hot paths will now always be a single bitwise comparison
> > and will no longer need to hit two separate variables for two separate
> > compares.
> >
> >
> >
>


--
Doug Ledford <dledford@xxxxxxxxxx>
GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: This is a digitally signed message part