Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)

From: Andrew Morton (
Date: Sun Jan 28 2001 - 00:34:51 EST wrote:
> Hello!
> > 2.4.1-pre10+zercopy, using read()/write(): 38.1% CPU
> write() on zc card is worse than normal write() by definition.
> It generates split buffers.

yes. The figures below show this. Disabling SG+checksums speeds
up write() and send().

> Split buffers are more expensive and we have to pay for this.
> You have paid too much for slow card though. 8)
> Do you measure load correctly?

Yes. Quite confident about this. Here's the algorithm:

1: Run a cycle-soaker on each CPU on an otherwise unloaded
   system. See how much "work" they all do per second.

2: Run the cycle-soakers again, but with network traffic happening.
   See how much their "work" is reduced. Deduce networking CPU load
   from this difference.

   The networking code all runs SCHED_FIFO or in interrupt context,
   so the cycle-soakers have no effect upon the network code's access
   to the CPU.

   The "cycle-soakers" just sit there spinning and dirtying 10,000
   cachelines per second.

> > 2.4.1-pre10+zercopy, using read()/write(): 39.2% CPU * hardware tx checksums disabled
> This is illegal combination of parameters. You force two memory accesses,
> doing this. The fact that it does not add to load is dubious. 8)8)

mm.. Perhaps with read()/write() the data is already in cache?

Anyway, I've tweaked up the tool again so it can do send() or
write() (then I looked at the implementation and wondered why
I'd bothered). It also does TCP_CORK now.

I ran another set of tests. The zerocopy patch improves sendfile()
hugely but slows down send()/write() significantly, with a 3c905C:

The kernels which were tested were 2.4.1-pre10 with and without the
zerocopy patch. We only look at client load (the TCP sender).

In all tests the link throughput was 11.5 mbytes/sec at all times
(saturated 100baseT) unless otherwise noted.

The client (the thing which sends data) is a dual 500MHz PII with a

For the write() and send() tests, the chunk size was 64 kbytes.

The workload was 63 files with an average length of 350 kbytes.


    2.4.1-pre10+zerocopy, using sendfile(): 9.6%
    2.4.1-pre10+zerocopy, using send(): 24.1%
    2.4.1-pre10+zerocopy, using write(): 24.2%

    2.4.1-pre10+zerocopy, using sendfile(): 16.2% * checksums and SG disabled
    2.4.1-pre10+zerocopy, using send(): 21.5% * checksums and SG disabled
    2.4.1-pre10+zerocopy, using write(): 21.5% * checksums and SG disabled

    2.4.1-pre10-vanilla, using sendfile(): 17.1%
    2.4.1-pre10-vanilla, using send(): 21.1%
    2.4.1-pre10-vanilla, using write(): 21.1%

Bearing in mind that a large amount of the load is in the device
driver, the zerocopy patch makes a large improvement in sendfile
efficiency. But read() and send() performance is decreased by 10% -
more than this if you factor out the constant device driver overhead.

TCP_CORK makes no difference. The files being sent are much larger
than a single frame.


  For a NIC which cannot do scatter/gather/checksums, the zerocopy
  patch makes no change in throughput in all case.

  For a NIC which can do scatter/gather/checksums, sendfile()
  efficiency is improved by 40% and send() efficiency is decreased by
  10%. The increase and decrease caused by the zerocopy patch will in
  fact be significantly larger than these two figures, because the
  measurements here include a constant base load caused by the device

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Jan 31 2001 - 21:00:29 EST