Re: [RFC v2: Patch 1/3] net: hand off skb list to other cpu tosubmit to upper layer
From: Ben Hutchings
Date: Fri Mar 13 2009 - 18:11:27 EST
On Fri, 2009-03-13 at 14:01 -0700, Tom Herbert wrote:
> On Fri, Mar 13, 2009 at 11:51 AM, David Miller <davem@xxxxxxxxxxxxx> wrote:
> >
> > From: Tom Herbert <therbert@xxxxxxxxxx>
> > Date: Fri, 13 Mar 2009 10:06:56 -0700
> >
> > > You'll definitely want to look at the hardware provided hash. We've
> > > been using a 10G NIC which provides a Toeplitz hash (the one defined
> > > by Microsoft) and a software RSS-like capability to move packets from
> > > an interrupting CPU to another for processing. The hash could be used
> > > to index to a set of CPUs, but we also use the hash as a connection
> > > identifier to key into a lookup table to steer packets to the CPU
> > > where the application is running based on the running CPU of the last
> > > recvmsg. Using the device provided hash in this manner is a HUGE win,
> > > as opposed to taking cache misses to get 4-tuple from packet itself to
> > > compute a hash. I posted some patches a while back on our work if
> > > you're interested.
> >
> > I never understood this.
> >
> > If you don't let the APIC move the interrupt around, the individual
> > MSI-X interrupts will steer packets to individual specific CPUS and as
> > a result the scheduler will migrate tasks over to those cpus since the
> > wakeup events keep occuring there.
>
> We are trying to follow the decisions scheduler as opposed to leading
> it. This works on very loaded systems, with applications binding to
> cpusets, with threads that are receiving on multiple sockets. I
> suppose it might be compelling if a NIC could steer packets per flow,
> instead of by a hash...
Depending on the NIC, RX queue selection may be done using a large
number of bits of the hash value and an indirection table or by matching
against specific values in the headers. The SFC4000 supports both of
these, though limited to TCP/IPv4 and UDP/IPv4. I think Neptune may be
more flexible. Of course, both indirection table entries and filter
table entries will be limited resources in any NIC, so allocating these
wholly automatically is an interesting challenge.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/