Tuning e1000 driver to avoid interface lock down

From: Kiran Kiran
Date: Mon Feb 23 2004 - 03:20:44 EST


Hi all,

What are the best values for RxIntDelay, RxDescriptors, TxIntDelay and TxDescriptors for e1000 driver to get maximum connection rate. My tests on a dual CPU box running RHEL 3.0 SMP kernel (2.4.21-4ELsmp) are causing the network interface to lock down (with irqbalance process killed). With irqbalance process running the system locks down.

I have set txqueuelen to 20000, killed irqbalance process and setup smp_affinity so that CPU1 (on dual PIII box) handles all interrupts generated by this device. When pounding the box with about 10000 HTTP conn/sec I see CPU1 utilized about 60% whereas CPU0 about 10% (BTW I was using a trivial filesize of 32 bytes ... don't ask why :-) After about 5 minutes I see the connection rate drop to 0 and ifconfig shows that the NIC is dropping packets. The following is the output of ifconfig

RX packets:13468106 errors:2206 dropped:2206 overruns:1558 frame:0
TX packets:13466855 errors:0 dropped:0 overruns:0 carrier:4
collisions:0 txqueuelen:20000
RX bytes:1277599772 (1218.4 Mb) TX bytes:1277420502 (1218.2 Mb)
Interrupt:24 Base address:0xc8e0 Memory:fe920000-fe940000

I set RxDescriptors to 4096 and RxIntDelay to 0 (BTW I see this even with RxIntDelay of 64). I also set the following:

echo 300000 > /proc/sys/net/core/hot_list_length
echo 300000 > /proc/sys/net/core/netdev_max_backlog

The rest of the TCP tunables were obtained from the SpecWeb results pages.
http://www.spec.org/osg/web99/results/res2003q3/web99-20030818-00245.html

Also, what is the "idle=poll" setting suggested on the Specweb results page mean? I have not set this in my tests. Will it make any difference? If so what and why?

I'll really appreciate it if someone can help me choose the correct values to tune the kernel, driver and card to stop the interface/system from locking down. Another thing, when the interface locks down (i.e., no network activity on it) I see the following CPU utilization:

CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.0% 0.5% 0.0% 50.0% 0.0% 49.5%
cpu00 0.0% 0.0% 1.0% 0.0% 0.0% 0.0% 99.0%
cpu01 0.0% 0.0% 0.0% 0.0% 100.0% 0.0% 0.0%

It appears as if the CPU is busy processing the ksoftirqd/1 process. Here is the output of stat for this process:

% cat /proc/6/stat
6 (ksoftirqd/1) S 1 1 1 0 -1 64 0 0 0 0 0 0 0 0 34 19 0 0 125 0 0 4294967295 0 0 0 0 0 0 2147483647 0 0 3222466767 0 0 17 1 0 0 0 0 0 0

And if it is of any helf the output of softnet_stat

% cat /proc/net/softnet_stat
000c9732 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000003
00cd9256 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000004

Any help will be greatly appreciated.

tx
Yedok

_________________________________________________________________
Find and compare great deals on Broadband access at the MSN High-Speed Marketplace. http://click.atdmt.com/AVE/go/onm00200360ave/direct/01/

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html