Re: zero-copy TCP

From: Jamie Lokier (
Date: Sun Sep 03 2000 - 15:37:30 EST

Andi Kleen wrote:
> On Sun, Sep 03, 2000 at 05:22:44AM +0200, Jamie Lokier wrote:
> > I just thought I'd mention that you can do zero copy TCP in and out
> > *without* any page marking schemes. All you need is a network card with
> > quite a lot of RAM and some intelligence. An Alteon could do it, with
> > extra RAM or an impressively underloaded network.
> The big problem with that is that you end up with not having a single
> stack, but N + 1 (N being the number of intelligent cards) stacks.
> Mainteance nightmare.

No, the stack stays in the kernel. That's the beauty of this scheme.

There are any number of proposed "put the stack on the intelligent card"
schemes out there, and they all suffer from the same problem: it's not
quite the stack you want + maintenance nightmare.

For the scheme I am referring to, all protocol handling is done by the
kernel, using the normal code. The card does have a little
intelligence: it implements a cache of data payloads, and can do
checksums. That's all. (It doesn't know about TCP or even IP for example).

Run time is like this:

1. write/sendto

   user space calls write()
   create skbuffs as usual including data part, but mark as "cached on NIC"
   DMA user space data directly to NIC (needs driver hook)
   schedule if worthwhile
   DMA completes, return

2. read/recvfrom

   user space calls read()
   look for skbuffs as usual
   if skbuff says "data part cached on NIC", DMA it from NIC
   schedule if worthwhile
   DMA completes, return

3. NIC driver xmit function

   if skbuff says "data part cached on NIC", there's simply less to DMA

4. NIC driver receive function

   fetch enough data from the NIC for header processing.
     (NIC may help minimise this, but a constant is good enough)
   mark new skbuff as "data prt cached on NIC"

I wouldn't be surprised to find even some of the older hardware NICs can
be made to work this way. Relatively little is required of the driver
-- just the ability to manage the card's memory independently of packet
I/O on the wire.

ps. Just to address Alan's point: data is never accessed directly over
the PCI bus. Perhaps he's thinking of another scheme.

-- Jamie
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:16 EST