Re: zero-copy TCP

From: Lincoln Dale (ltd@cisco.com)
Date: Sun Sep 03 2000 - 19:22:38 EST


At 22:53 03/09/00, Linus Torvalds wrote:
> >Ugh. User space DMA gets complicated quickly. The performance question
> >is, perhaps, can you do this without a TLB flush (but with locking the
> >struct page of course). Note that it doesn't matter if another thread,
> >and this includes truncate/write in another thread, clobbers the page
> >data. That's just the normal effect of two concurrent writers to the
> >same memory.
...
>People who claim "zero-copy" is a great thing often ignore the costs of
>_not_ copying altogether.

many people (myself included) have been experimenting with zerocopy
infrastructures.
in my case, i've been working on it as time permits for quite a few months
now, and am about on my fourth rewrite.

i've found exactly what you state about the bad things that occur when you
associate zerocopy infrastructure with user-space code. some of the the MM
tricks required for handling individual pages effectively kills any
performance gain.

however, approaching it from the other angle of "buffers pinned in kernel
memory" can give you a huge win.
for the application which prompted me to begin looking at this problem,
where packets typically go network -> RAM -> network, providing a zerocopy
infrastructure for (a) viewing incoming packet streams pinned in kernel
memory from user-space [a sort-of SIGIO with pointers to the buffers], and
(b) hooks for user-space directing the kernel to do things with these
buffers [eg. "queue buffer A for output on fd Y"] has provided an immediate
60% performance gain.

performance was previously pinned on front-side-bus (or memory) bandwidth.

the interfaces are a bit hacky, and the way one has to queue packets for
tcp-write is awful right now, but i hope these can be cleaned up over time.

network cards which offload the IP & TCP checksum calculation isn't even
required; provided the incoming checksum is preserved, the original pseudo
TCP header can be "reversed out" without having to re-read the entire
packet payloads again.

cheers,

lincoln.

--
   Lincoln Dale           Content Services Business Unit
   ltd@cisco.com          cisco Systems, Inc.       |         |
                                                    ||        ||
   +1 (408) 525-1274      bldg G, 170 West Tasman  ||||      ||||
   +61 (3) 9659-4294 <<   San Jose CA 95134    ..:||||||:..:||||||:.. 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:16 EST