Re: ZONE_NORMAL memory exhausted by 4000 TCP sockets

From: Stephen Hemminger
Date: Mon Nov 06 2006 - 11:36:53 EST


Eric Dumazet wrote:
Zhao Xiaoming a écrit :
Dears,
I'm running a linux box with kernel version 2.6.16. The hardware
has 2 Woodcrest Xeon CPUs (2 cores each) and 4G RAM. The NIC cards is
Intel 82571 on PCI-e bus.
The box is acting as ethernet bridge between 2 Gigabit Ethernets.
By configuring ebtables and iptables, an application is running as TCP
proxy which will intercept all TCP connections requests from the
network and setup another TCP connection to the acture server. The
TCP proxy then relays all traffics in both directions.
The problem is the memory. Since the box must support thousands of
concurrent connections, I know the memory size of ZONE_NORMAL would be
a bottleneck as TCP packets would need many buffers. After setting
upper limit of net.ipv4.tcp_rmem and net.ipv4.tcp_wmem to 32K bytes,
our test began.
My test scenario employs 2000 concurrent downloading connections
to a IIS server's port 80. The throughput is about 500~600 Mbps which
is limited by the capability of the client application. Because all
traffics are from server to client and the capability of client
machine is bottleneck, I believe the receiver side of the sockets
connected with server and the sender side of the sockets connected
with client should be filled with packets in correspondent windows.
Thus, roughly there should be about 32K * 2000+ 32K*2000 = 128M bytes
memory occupied by TCP/IP stack for packet buffering. Data from
slabtop confermed it. it's about 140M bytes memory cost after I start
the traffic. That reasonablly matched with my estimation. However,
/proc/meminfo had a different story. The 'LowFree' dropped from about
710M to 80M. In other words, there's addtional 500M memory in
ZONE_NORMAL allocated by someone other than the slab. Why?
The amount of memory per socket is controlled by the socket buffering. Your application
could be setting the value by calling setsockopt(). Otherwise, the tcp memory is limited
by the sysctl settings tcp_rmem (receiver) and tcp_wmem (sender).

For example on this server:
$ cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 131072

Each sending socket would start with 16K of buffering, but could grow up to 128K based
on TCP send autotuning.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/