Re: [PATCH v2 04/13] Connection Manager

From: Steve Wise
Date: Tue Dec 05 2006 - 12:52:04 EST


On Tue, 2006-12-05 at 20:26 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 10:47:25AM -0600, Steve Wise (swise@xxxxxxxxxxxxxxxxxxxxx) wrote:
> > > And if there were a dataflow between addr/port a.b to addr/port c.d
> > > already, it will either terminated?
> > >
> > > Considering the following sequence:
> > > handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
> > > example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
> > > skb and sets port/addr/route and other fields to be used as a base for rdma
> > > connection. What if it just a usual network packet from kernelspace or
> > > userspace with the same payload as should be sent by remote rdma system?
> > >
> >
> > That skb isn't a network packet. Its a CPL_PASS_ACCEPT_REQ message (see
> > struct cpl_pass_accept_req in the Ethernet driver t3_cpl.h). If the
> > RDMA driver hadn't registered to listen on that addr/port, it would
> > never get this skb. Once a connection is established, the MPA messages
> > (and any TCP payload data) is delivered to the RDMA driver in the form
> > of skb's containing struct cpl_rx_data. So these skbs aren't just TCP
> > packets at all. They either control messages or TCP payload. Either way
> > they are encapsulated in CPL message structures.
> >
> > Does this make sense?
>
> Almost - except the case about where those skbs are coming from?
> It looks like they are obtained from network, since it is ethernet
> driver, and if they match some set of rules, they are considered as valid
> MPA negotiation protocol.

They come from the Ethernet driver, but that driver manages multiple HW
queues and these packets come from an offload queue, not the NIC queue.
So the HW demultiplexes.

Perhaps Divy or Felix from Chelsio can expand on how the Ethernet driver
manages this?

>
> If it is correct, it means that any packet in the network can be
> potentially 'stolen' by rdma hardware, although it was part of the usual
> dataflow.
> If that packets are not from ethernet network, but from different
> low-level, then there is a question (besides why this driver is called
> ethernet if it manages different hardware) about how connection over
> that different media is being setup and since packets contain perfectly
> valid IP addresses and ports.

The HW has different queues for offload vs native Ethernet frames. I'm
not an expert on the Ethernet driver, so you'll have to consult that
code and ask questions of Divy and/or Felix.

> And, btw, not related question - does postponing the whole skb multiplexing
> to work queue result in lower latency and/or higher speed?
> Since there are a lot of tricks introduced to minimize gap between
> interrupt/napi polling and protocol processing, so such huge postponing
> with the whole context switch looks strange.
>

Neither. The work queue makes the RDMA driver's life easier because it
has context to allocate skbs, for instance. Note all the work queue
stuff is done _only_ for RDMA connection setup and teardown. Once the
connection is in RDMA mode, there's no work queues at all for IO, and CQ
notifications happen in interrupt context. RDMA operations are
submitted to the hardware via iwch_post_send(). Completion notification
is done in the interrupt context via iwch_ev_dispatch(). And completion
entries reaped by the consumer application via iwch_poll_cq().


Steve.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/