Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)

From: Ion Badulescu (
Date: Sat Jan 27 2001 - 05:05:13 EST

On Sat, 27 Jan 2001 16:45:43 +1100, Andrew Morton <> wrote:

> The client is a 650 MHz PIII. The NIC is a 3CCFE575CT Cardbus 3com.
> It supports Scatter/Gather and hardware checksums. The NIC's interrupt
> is shared with the Cardbus controller, so this will impact throughput
> slightly.
> The kernels which were tested were 2.4.1-pre10 with and without the
> zerocopy patch. We only look at client load (the TCP sender).
> The link throughput was 11.5 mbytes/sec at all times (saturated 100baseT)
> 2.4.1-pre10-vanilla, using sendfile(): 29.6% CPU
> 2.4.1-pre10-vanilla, using read()/write(): 34.5% CPU
> 2.4.1-pre10+zercopy, using sendfile(): 18.2% CPU
> 2.4.1-pre10+zercopy, using read()/write(): 38.1% CPU
> 2.4.1-pre10+zercopy, using sendfile(): 22.9% CPU * hardware tx checksums disabled
> 2.4.1-pre10+zercopy, using read()/write(): 39.2% CPU * hardware tx checksums disabled

750MHz PIII, Adaptec Starfire NIC, driver modified to use hardware sg+csum
(both Tx/Rx), and Intel i82559 (eepro100), no hardware csum support,
vanilla driver.

The box has 512MB of RAM, and I'm using a 100MB file, so it's entirely cached.

2.4.1-pre10+zerocopy, using sendfile(): 9.6% CPU
2.4.1-pre10+zerocopy, using read()/write(): 18.3%-29.6% CPU * why so much variance?

2.4.1-pre10+zerocopy, using sendfile(): 17.4% CPU * hardware csum disabled
2.4.1-pre10+zerocopy, using read()/write(): 16.5%-26.8% CPU * idem, again why so much variance?

2.4.1-pre10-vanilla, using sendfile(): 16.5% CPU
2.4.1-pre10-vanilla, using read()/write(): 14.5%-24.5% CPU * high variance again

2.4.1-pre10+zerocopy, using sendfile(): 16.0% CPU
2.4.1-pre10+zerocopy, using read()/write(): 15.0%-24.5% CPU * why so much variance?

2.4.1-pre10-vanilla, using sendfile(): 16.7% CPU
2.4.1-pre10-vanilla, using read()/write(): 14.5%-24.6% CPU * high variance again

The read+write case is really weird. I'm getting results like this:

CPU load: 27.9491
CPU load: 25.4763
CPU load: 15.8544
CPU load: 25.455
CPU load: 25.2072
CPU load: 15.8677
CPU load: 25.4896
CPU load: 25.2791
CPU load: 15.8837

i.e. 2 slow, 1 fast, 2 slow, 1 fast, and so on so forth.

> What can we conclude?
> - sendfile is 10% cheaper than read()-then-write() on 2.4.1-pre10.

Hard to tell, with such inconclusive results...

> - sendfile() with the zerocopy patch is 40% cheaper than
> sendfile() without the zerocopy patch.

Indeed. Close to 50% in fact.

> - hardware Tx checksums don't make much difference. hmm...

Actually it makes all the difference in the world for the starfire.

> Bear in mind that the 3c59x driver uses a one-interrupt-per-packet
> algorithm. Mitigation reduces this to 0.3 ints/packet.
> So we're absorbing 4,500 interrupts/sec while processing
> 12,000 packets/sec. gigE NICs do much better mitigation than
> this and the relative benefits of zerocopy will be much higher
> for these. Hopefully Jamal can do some testing.

Hmm.. the starfire also has quite advanced interrupt mitigation,
but I have not played with it. Maybe tomorrow. So these results
are with one-interrupt-per-packet.

P.S. The starfire still doesn't like tinygrams (skb's with 1-byte
fragments). Fortunately your test program doesn't seem to generate
them. :-)


  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at

This archive was generated by hypermail 2b29 : Wed Jan 31 2001 - 21:00:27 EST