[PATCH 3.4 89/91] net: Do not enable tx-nocache-copy by default

From: lizf
Date: Thu Nov 27 2014 - 03:51:48 EST


From: Benjamin Poirier <bpoirier@xxxxxxx>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c upstream.

There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
meanÂstddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
692000Â1000 tps
50/90/95/99% latency (us): 275Â2/643.8Â0.4/799Â1/2474.4Â0.3
tx-nocache-copy on
693000Â1000 tps
50/90/95/99% latency (us): 274Â1/644.1Â0.7/800Â2/2474.5Â0.7

200x netperf -r 14000,14000
tx-nocache-copy off
86450Â80 tps
50/90/95/99% latency (us): 334.37Â0.02/838Â1/2100Â20/3990Â40
tx-nocache-copy on
86110Â60 tps
50/90/95/99% latency (us): 334.28Â0.01/837Â2/2110Â20/3990Â20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

throughput cpu0 cpu1 demand
(Gb/s) (Gcycle) (Gcycle) (cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off 9402Â5 9.4Â0.2 0.80Â0.01
tx-nocache-copy on 9403Â3 9.85Â0.04 0.838Â0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off 9401Â5 5.83Â0.03 5.0Â0.1 0.923Â0.007
tx-nocache-copy on 9404Â2 5.74Â0.03 5.523Â0.009 0.958Â0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000

Performance counter stats for './netperf -H lpq84 -c':

4282.648396 task-clock # 0.423 CPUs utilized
9,348 context-switches # 0.002 M/sec
88 CPU-migrations # 0.021 K/sec
355 page-faults # 0.083 K/sec
11,812,797,651 cycles # 2.758 GHz [82.79%]
9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%]
4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%]
6,053,172,792 instructions # 0.51 insns per cycle
# 1.49 stalled cycles per insn [83.64%]
597,275,583 branches # 139.464 M/sec [83.70%]
8,960,541 branch-misses # 1.50% of all branches [83.65%]

10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000

Performance counter stats for './netperf -H lpq84 -c':

2847.375441 task-clock # 0.281 CPUs utilized
11,632 context-switches # 0.004 M/sec
49 CPU-migrations # 0.017 K/sec
354 page-faults # 0.124 K/sec
7,646,889,749 cycles # 2.686 GHz [83.34%]
6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%]
1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%]
2,079,702,453 instructions # 0.27 insns per cycle
# 2.94 stalled cycles per insn [83.22%]
363,773,213 branches # 127.757 M/sec [83.29%]
4,242,732 branch-misses # 1.17% of all branches [83.51%]

10.128449949 seconds time elapsed

CC: Tom Herbert <therbert@xxxxxxxxxx>
Signed-off-by: Benjamin Poirier <bpoirier@xxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx>
---
net/core/dev.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b47375d..0770364 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5546,13 +5546,8 @@ int register_netdevice(struct net_device *dev)
dev->features |= NETIF_F_SOFT_FEATURES;
dev->wanted_features = dev->features & dev->hw_features;

- /* Turn on no cache copy if HW is doing checksum */
if (!(dev->flags & IFF_LOOPBACK)) {
dev->hw_features |= NETIF_F_NOCACHE_COPY;
- if (dev->features & NETIF_F_ALL_CSUM) {
- dev->wanted_features |= NETIF_F_NOCACHE_COPY;
- dev->features |= NETIF_F_NOCACHE_COPY;
- }
}

/* Make NETIF_F_HIGHDMA inheritable to VLAN devices.
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/