CONFIG_PACKET_MMAP revisited
From: odain2@xxxxxxxxxxxxxx
Date:  Tue Oct 28 2003 - 23:10:53 EST
I've been looking into faster ways to do packet captures and I stumbled on
the following discussion on the Linux Kernel mailing list:
http://www.ussg.iu.edu/hypermail/linux/kernel/0202.2/1173.html
In that discussion Jamie Lokier suggested having a memory buffer that's
shared between user and kernel space and having the NIC do DMA transfers
directly to that buffer as an alternative to using Alexy's shared ring
buffer stuff.  The argument was that this would avoid the memory copy that
the kernel does from the DMA buffer to the memory mapped ring buffer. 
However, Alan Cox pointed out that the main cost of the memory copy is
getting the data from system memory (where the NIC put it via DMA) into the
L1 cache (DMA doesn't do any cache coherence so it can't go there
directly).  The memory copy (presumably from L1 cache to L1 cache) is
insignificant compared to this cost and since you'll need to get the data
into L1 cache to use it anyway, the memory copy is virtually free.
I'm wondering if this takes all of the costs into account.  If I understand
how this works the user space application can't get at the packet without a
context switch so that the kernel can first copy the packet to the shared
buffer.  The cost of the context switch is pretty high and this seems to me
to be the main bottleneck.  I believe that in normal operation each packet
(or with NICs that do interrupt coalescing, every n packets) causes an
interrupt which causes a context switch, the kernel then copies the data
from the DMA buffer to the shared buffer and does a RETI.  That's fairly
expensive.  If, on the other hand, data could be copied directly to
user-space accessible memory the NIC wouldn't need to generate any
interrupts and the kernel wouldn't need to get involved at all (this
assumes a NIC that can be configured not to generate any interrupts).  The
user space application could then poll the shared buffer and process
packets as fast as possible (some synchronization mechanism is clearly
needed here, but I think some of the high-end programmable NICs could do
this).  Would this not be significantly more efficient then the current
implementation?
Thank,
Oliver
PS: I'm not a mailing list subscriber so CCs on responses would be
appreciated.
--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/