Re: [RFC PATCH 0/5] net: low latency Ethernet device polling
From: Rick Jones
Date: Wed Feb 27 2013 - 14:58:14 EST
On 02/27/2013 09:55 AM, Eliezer Tamir wrote:
This patchset adds the ability for the socket layer code to poll directly
on an Ethernet device's RX queue. This eliminates the cost of the interrupt
and context switch and with proper tuning allows us to get very close
to the HW latency.
This is a follow up to Jesse Brandeburg's Kernel Plumbers talk from last year
Patch 1 adds ndo_ll_poll and the IP code to use it.
Patch 2 is an example of how TCP can use ndo_ll_poll.
Patch 3 shows how this method would be implemented for the ixgbe driver.
Patch 4 adds statistics to the ixgbe driver for ndo_ll_poll events.
(Optional) Patch 5 is a handy kprobes module to measure detailed latency
this patchset is also available in the following git branch
Kernel Config C3/6 rx-usecs TCP UDP
3.8rc6 typical off adaptive 37k 40k
3.8rc6 typical off 0* 50k 56k
3.8rc6 optimized off 0* 61k 67k
3.8rc6 optimized on adaptive 26k 29k
patched typical off adaptive 70k 78k
patched optimized off adaptive 79k 88k
patched optimized off 100 84k 92k
patched optimized on adaptive 83k 91k
*rx-usecs=0 is usually not useful in a production environment.
I would think that latency-sensitive folks would be using rx-usecs=0 in
production - at least if the NIC in use didn't have low enough latency
with its default interrupt coalescing/avoidance heuristics.
If I take the first "pure" A/B comparison it seems that the change as
benchmarked takes latency for TCP from ~27 usec (37k) to ~14 usec (70k).
At what request/response size does the benefit taper-off? 13 usec
seems to be about 16250 bytes at 10 GbE.
When I last looked at netperf TCP_RR performance where something similar
could happen I think it was IPoIB where it was possible to set things up
such that polling happened rather than wakeups (perhaps it was with a
shim library that converted netperf's socket calls to "native" IB). My
recollection is that it "did a number" on the netperf service demands
thanks to the spinning. It would be a good thing to include those
figures in any subsequent rounds of benchmarking.
Am I correct in assuming this is a mechanism which would not be used in
a high aggregate PPS situation?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/