--8323584-1436868820-920816185=:1813
Content-Type: TEXT/PLAIN; charset=US-ASCII
Some people already reported problems with tcp stalls (Netscape,
XEmacs/Gnus etc.). I did some debugging and hopefully found out,
when and why this happens.
Here again is a typical trace:
> 13:53:20.339402 lh.3333 > lh.1284: P 25161:26641(1480) ack 1 win 31072 (DF)
> 13:53:20.339402 lh.3333 > lh.1284: P 25161:26641(1480) ack 1 win 31072 (DF)
> 13:53:20.339450 lh.1284 > lh.3333: . ack 26641 win 4432 (DF)
> 13:53:20.339450 lh.1284 > lh.3333: . ack 26641 win 4432 (DF)
> 13:53:20.339864 lh.3333 > lh.1284: P 26641:28121(1480) ack 1 win 31072 (DF)
> 13:53:20.339864 lh.3333 > lh.1284: P 26641:28121(1480) ack 1 win 31072 (DF)
> 13:53:20.339903 lh.1284 > lh.3333: . ack 28121 win 2952 (DF)
> 13:53:20.339903 lh.1284 > lh.3333: . ack 28121 win 2952 (DF)
> 13:53:20.340299 lh.3333 > lh.1284: P 28121:29601(1480) ack 1 win 31072 (DF)
> 13:53:20.340299 lh.3333 > lh.1284: P 28121:29601(1480) ack 1 win 31072 (DF)
> 13:53:20.340337 lh.1284 > lh.3333: . ack 29601 win 1472 (DF)
> 13:53:20.340337 lh.1284 > lh.3333: . ack 29601 win 1472 (DF)
*** stall ***
As Stanislav Meduna already postet, one part of the problem is in
tcp.c cleanup_rbuf():
if((copied >= rcv_window_now) &&
((rcv_window_now + tp->mss_cache) <= tp->window_clamp))
tcp_read_wakeup(sk);
The first check should prevent unnecessary ACKs. Unfortunatly
it also prevents advertising an increased window sometimes.
When the reader always reads small chunks of data, say 15 bytes,
the check will always fail since "copied" is always 15 while
rcv_window_now mostly is larger.
But why does the sender not simply send 1472 bytes after receiving
13:53:20.340337 lh.1284 > lh.3333: . ack 29601 win 1472 (DF) ?
The reader would then advertise win 0 and everything would be fine
(the sender would use TIME_PROBE0 to frequently probe the win size
of the reader).
The reason is that the SendQ is empty. This is the timing issue here
which made the problem appear like a race.
Since SendQ is empty now, the kernel will wake up the sender proc.
The sender now continues calling write(). This invokes tcp_v4_sendmsg()
in tcp_ipv4.c. Now there are 2 possibilities that data actually gets
transmitted.
1. In tcp_v4_sendmsg() -> tcp_do_sendmsg() ->
tcp_send_skb((sk, skb, queue_it);
For small write sizes ( < mss and < max_window/2 ) "queue_it"
is always true. This prevents any transmitting and only queues
the data.
2. In tcp_v4_sendmsg() ->
if(tp->send_head && tcp_snd_test(sk, tp->send_head))
tcp_write_xmit(sk);
The same. tcp_snd_test will always fail for small write() sizes.
** Summary **
A tcp stall occures when:
1. Reader sends an ACK advertising a win < mss / 2.
2. Senders SendQ is empty in this moment.
3. Senders subsequent write() calls use write sizes < mss.
The attached program can demonstrate that. It creates a sender
and a reader. The sender sends about 50kb to the reader while
the reader sleeps().
netstat says correctly:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:3334 localhost:1033 ESTABLISHED
tcp 57500 0 localhost:1033 localhost:3334 ESTABLISHED
Now the reader wakes up and reads the data and the sender sleeps()
10 seconds. Then the sender writes again 3871 bytes, which is the
critical size on my machine (mtu 3924).
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 3872 localhost:3334 localhost:1033 FIN_WAIT1
tcp 0 0 localhost:1033 localhost:3334 ESTABLISHED
There is the stall. No further write() will cause any data transmission
as long as the write size is <= 3871.
I can not say whether sender or reader side is wrong. I would say both
but I still have to read the RFCs :).
Any comments?
-- Matthias
--8323584-1436868820-920816185=:1813
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="sc.c"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.3.96.990307151625.1813C@ice.robin.de>
Content-Description: sc.c
I2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3RyaW5nLmg+DQojaW5j
bHVkZSA8c3RkbGliLmg+DQojaW5jbHVkZSA8c3lzL3RpbWUuaD4NCiNpbmNs
dWRlIDxzeXMvdHlwZXMuaD4NCiNpbmNsdWRlIDx1bmlzdGQuaD4NCiNpbmNs
dWRlIDxuZXRpbmV0L2luLmg+DQojaW5jbHVkZSA8c3lzL3VuLmg+DQojaW5j
bHVkZSA8ZXJybm8uaD4NCiNpbmNsdWRlIDxmY250bC5oPg0KI2luY2x1ZGUg
PHN5cy93YWl0Lmg+DQojaW5jbHVkZSA8c2lnbmFsLmg+DQojaW5jbHVkZSA8
bmV0ZGIuaD4NCg0KI2RlZmluZSBXU0laRTEgNTc1MDANCiNkZWZpbmUgV1NJ
WkUyIDM4NzENCiNkZWZpbmUgUlNJWkUgMTUNCiNkZWZpbmUgUE9SVCAzMzM0
DQoNCmNoYXIgYnVmZmVyW1dTSVpFMSArIFdTSVpFMl07DQoNCnZvaWQgc2Vy
dmVyKHZvaWQpDQp7DQogICAgc3RydWN0IHNvY2thZGRyX2luIGFkZHIsIGFj
Y2VwdF9hZGRyOw0KICAgIGludCBhZGRybGVuID0gMCwgbWFpblNvY2ssIHNv
Y2s7DQoNCiAgICBhZGRyLnNpbl9mYW1pbHkgPSBBRl9JTkVUOw0KICAgIGFk
ZHIuc2luX3BvcnQgPSBodG9ucyhQT1JUKTsNCiAgICBhZGRyLnNpbl9hZGRy
LnNfYWRkciA9IElOQUREUl9BTlk7DQoNCiAgICBpZigobWFpblNvY2sgPSBz
b2NrZXQoYWRkci5zaW5fZmFtaWx5LCBTT0NLX1NUUkVBTSwgMCkpICA8IDAp
DQogICAgeyBwZXJyb3IoInNvY2tldCgpIik7IGV4aXQoMSk7IH0NCg0KICAg
IGlmKCBiaW5kKG1haW5Tb2NrLCAmYWRkciwgc2l6ZW9mKGFkZHIpKSA8IDAp
DQogICAgeyBwZXJyb3IoImJpbmQoKSIpOyAgIGV4aXQoMSk7IH0NCiAgICAN
CiAgICBpZiggbGlzdGVuKG1haW5Tb2NrLCAxKSA8IDApIA0KICAgIHsgcGVy
cm9yKCJsaXN0ZW4oKSIpOyBleGl0KDEpOyB9DQogICAgDQogICAgc29jayA9
IGFjY2VwdChtYWluU29jaywgKHN0cnVjdCBzb2NrYWRkciAqKSZhY2NlcHRf
YWRkciwgJmFkZHJsZW4pOw0KICAgIGlmKHNvY2sgPCAwKSB7IHBlcnJvcigi
YWNjZXB0KCkiKTsgYWJvcnQoKTsgfQ0KICAgIA0KICAgIHdyaXRlKHNvY2ss
IGJ1ZmZlciwgV1NJWkUxKTsNCg0KICAgIHNsZWVwKDEwKTsNCg0KICAgIHdy
aXRlKHNvY2ssIGJ1ZmZlciwgV1NJWkUyKTsNCg0KICAgIHNsZWVwKDMpOw0K
ICAgIGNsb3NlKHNvY2spOw0KICAgIGNsb3NlKG1haW5Tb2NrKTsNCiAgICBh
Ym9ydCgpOw0KfQ0KDQpzdGF0aWMgdm9pZCBjbGllbnQodm9pZCkNCnsNCiAg
ICBzdHJ1Y3Qgc29ja2FkZHJfaW4gYWRkcjsNCiAgICBzdHJ1Y3QgaG9zdGVu
dCAqaG9zdGVudDsNCiAgICBjaGFyIGJ1ZltSU0laRV07DQogICAgaW50IGxl
biwgc29jazsNCg0KICAgIGlmKCEoaG9zdGVudCA9IGdldGhvc3RieW5hbWUo
ImxvY2FsaG9zdCIpKSkNCiAgICB7IHByaW50ZigiQ291bGQgbm90IHJlc29s
dmUgaG9zdCFcbiIpOyBleGl0KDEpOyB9DQoNCiAgICBhZGRyLnNpbl9wb3J0
ID0gaHRvbnMoUE9SVCk7DQogICAgYWRkci5zaW5fZmFtaWx5ID0gaG9zdGVu
dC0+aF9hZGRydHlwZTsNCiAgICBtZW1jcHkoJmFkZHIuc2luX2FkZHIsIGhv
c3RlbnQtPmhfYWRkciwgaG9zdGVudC0+aF9sZW5ndGgpOw0KDQogICAgaWYo
KHNvY2sgPSBzb2NrZXQoYWRkci5zaW5fZmFtaWx5LCBTT0NLX1NUUkVBTSwg
MCkpIDwgMCkNCiAgICB7IHBlcnJvcigic29ja2V0KCkiKTsgZXhpdCgxKTsg
fQ0KDQogICAgd2hpbGUoY29ubmVjdChzb2NrLCAoc3RydWN0IHNvY2thZGRy
KikmYWRkciwgc2l6ZW9mKGFkZHIpKSA8IDApDQogICAgeyBwZXJyb3IoImNv
bm5lY3QoKSIpOyBleGl0KDEpOyB9DQoNCiAgICBzbGVlcCg1KTsNCg0KICAg
IGxlbiA9IDA7DQogICAgd2hpbGUobGVuIDwgV1NJWkUxICsgV1NJWkUyKQ0K
CWxlbiArPSByZWFkKHNvY2ssIGJ1ZiwgUlNJWkUpOw0KDQogICAgY2xvc2Uo
c29jayk7DQp9DQoNCmludCBtYWluKGludCBhcmdjLCBjaGFyICoqYXJndikN
CnsNCiAgICBpbnQgaTsNCg0KICAgIGkgPSBmb3JrKCk7DQogICAgaWYoIWkp
DQoJc2VydmVyKCk7DQoNCiAgICBzbGVlcCgyKTsNCiAgICBjbGllbnQoKTsN
CiAgICB3YWl0KCZpKTsNCg0KICAgIHJldHVybiAwOw0KfQ0K
--8323584-1436868820-920816185=:1813--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/