RE: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes

From: Steve Wise
Date: Fri Jan 05 2007 - 14:00:37 EST


On Fri, 2007-01-05 at 09:32 -0800, Felix Marti wrote:
>
> > -----Original Message-----
> > From: openib-general-bounces@xxxxxxxxxx [mailto:openib-general-
> > bounces@xxxxxxxxxx] On Behalf Of Steve Wise
> > Sent: Friday, January 05, 2007 6:22 AM
> > To: Roland Dreier
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; openib-general@xxxxxxxxxx;
> > netdev@xxxxxxxxxxxxxxx
> > Subject: Re: [openib-general] [PATCH v4 01/13] Linux RDMA Core Changes
> >
> > On Thu, 2007-01-04 at 13:34 -0800, Roland Dreier wrote:
> > > OK, I'm back from vacation today.
> > >
> > > Anyway I don't have a definitive statement on this right now. I
> guess
> > > I agree that I don't like having an extra parameter to a function
> that
> > > should be pretty fast (although req notify isn't quite as hot as
> > > something like posting a send request or polling a cq), given that
> it
> > > adds measurable overhead. (And I am surprised that the overhead is
> > > measurable, since 3 arguments still fit in registers, but OK).
> > >
> > > I also agree that adding an extra entry point just to pass in the
> user
> > > data is ugly, and also racy.
> > >
> > > Giving the kernel driver a pointer it can read seems OK I guess,
> > > although it's a little ugly to have a backdoor channel like that.
> > >
> >
> > Another alternative is for the cq-index u32 memory to be allocated by
> > the kernel and mapped into the user process. So the lib can
> read/write
> > it, and the kernel can read it directly. This is the fastest way
> > perfwise, but I didn't want to do it because of the page granularity
> of
> > mapping. IE it would require a page of address space (and backing
> > memory I guess) just for 1 u32. The CQ element array memory is
> already
> > allocated this way (and its DMA coherent too), but I didn't want to
> > overload that memory with this extra variable either. Mapping just
> > seemed ugly and wasteful to me.
> >
> > So given 3 approaches:
> >
> > 1) allow user data to be passed into ib_req_notify_cq() via the
> standard
> > uverbs mechanisms.
> >
> > 2) hide this in the chelsio driver and have the driver copyin the info
> > directly.
> >
> > 3) allocate the memory for this in the kernel and map it to the user
> > process.
> >
> > I chose 1 because it seemed the cleanest from an architecture point of
> > view and I didn't think it would impact performance much.
>
> [Felix Marti] In addition, is arming the CQ really in the performance
> path? - Don't apps poll the CQ as long as there are pending CQEs and
> only arm the CQ for notification once there is nothing left to do? If
> this is the case, it would mean that we waste a few cycles 'idle'
> cycles.

I tend to agree. This shouldn't be the hot perf path like
post_send/recv and poll.

> Steve, next to the micro benchmark that you did, did you run any
> performance benchmark that actually moves traffic? If so, did you see a
> difference in performance?
>

No. But I didn't explicitly measure with and without this one single
change.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/