tcp writev() bug in 2.1.x

Dean Gaudet (dgaudet-list-linux-kernel@arctic.org)
Thu, 5 Mar 1998 22:35:37 -0800 (PST)


This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

--1053580162-1133330555-889166137=:17170
Content-Type: TEXT/PLAIN; charset=US-ASCII

Eric Schenk fixed this bug in 2.0.31. It's still in 2.1.x. The attached
test-writev.c program will send three 100byte buffers using writev()
and with a 2.1.x kernel (or 2.0.x x<31) this will cause three segments
to be put on the wire.

Usage: ./test-writev.c a.b.c.d port#
(I usually choose a.b.c.d to be a nearby webserver.)

Here's a tcpdump. 10.7.63.4 is a 2.1.86 box and 10.7.63.6 is a 2.0.33 box.
They're both running webservers on port 8080... and I do the test both ways
to show the difference.

23:22:02.960371 10.7.63.6.5735 > 10.7.63.4.8080: S 1781928942:1781928942(0) win 512 <mss 1460>
23:22:02.960500 10.7.63.4.8080 > 10.7.63.6.5735: S 4026453022:4026453022(0) ack 1781928943 win 32120 <mss 1460> (DF)
23:22:02.961013 10.7.63.6.5735 > 10.7.63.4.8080: . ack 1 win 32120 (DF)
23:22:02.961948 10.7.63.6.5735 > 10.7.63.4.8080: P 1:301(300) ack 1 win 32120 (DF)
^^^^^^^^^^ note one 300 byte packet
23:22:02.963124 10.7.63.6.5735 > 10.7.63.4.8080: F 301:301(0) ack 1 win 32120
23:22:02.963215 10.7.63.4.8080 > 10.7.63.6.5735: . ack 302 win 32120 (DF)
23:22:02.968917 10.7.63.4.8080 > 10.7.63.6.5735: P 1:452(451) ack 302 win 32120 (DF)
23:22:02.969571 10.7.63.4.8080 > 10.7.63.6.5735: F 452:452(0) ack 302 win 32120 (DF)
23:22:02.970320 10.7.63.6.5735 > 10.7.63.4.8080: R 1781929244:1781929244(0) win 0
23:22:02.970446 10.7.63.6.5735 > 10.7.63.4.8080: R 1781929244:1781929244(0) win 0

23:22:10.170960 10.7.63.4.1660 > 10.7.63.6.8080: S 4031765920:4031765920(0) win 32120 <mss 1460> (DF)
23:22:10.171624 10.7.63.6.8080 > 10.7.63.4.1660: S 2422489659:2422489659(0) ack 4031765921 win 32736 <mss 1460>
23:22:10.171741 10.7.63.4.1660 > 10.7.63.6.8080: . ack 1 win 32120 (DF)
23:22:10.173215 10.7.63.4.1660 > 10.7.63.6.8080: . 1:101(100) ack 1 win 32120 (DF)
^^^^^^^^^ 100 bytes
23:22:10.173317 10.7.63.4.1660 > 10.7.63.6.8080: . 101:201(100) ack 1 win 32120 (DF)
^^^^^^^^^ 100 bytes
23:22:10.338739 10.7.63.6.8080 > 10.7.63.4.1660: . ack 201 win 32736 (DF)
23:22:10.338866 10.7.63.4.1660 > 10.7.63.6.8080: P 201:301(100) ack 1 win 32120 (DF)
^^^^^^^^^ 100 bytes
23:22:10.338955 10.7.63.4.1660 > 10.7.63.6.8080: F 301:301(0) ack 1 win 32120 (DF)
23:22:10.339621 10.7.63.6.8080 > 10.7.63.4.1660: . ack 302 win 32635 (DF)
23:22:10.347525 10.7.63.6.8080 > 10.7.63.4.1660: P 1:451(450) ack 302 win 32736 (DF)
23:22:10.347625 10.7.63.4.1660 > 10.7.63.6.8080: R 4031766222:4031766222(0) win 0
23:22:10.347553 10.7.63.6.8080 > 10.7.63.4.1660: F 451:451(0) ack 302 win 32736
23:22:10.347693 10.7.63.4.1660 > 10.7.63.6.8080: R 4031766222:4031766222(0) win 0

Motivation: Apache 1.3 uses writev().

Dean

Below is the patch that fixed writev in 2.0.31... I figure it may be useful
to someone wanting to bring it forward to 2.1.x.

--- vanilla/linux/net/ipv4/tcp.c Wed Apr 9 20:31:10 1997
+++ linux/net/ipv4/tcp.c Wed May 7 23:11:08 1997
sk->write_seq += copy;
seglen -= copy;
}
- if (tcp_size >= sk->mss || (flags & MSG_OOB) || !sk->packets_out)
+ /* If we have a full packet or a new OOB message, we have to
+ * force this packet out.
+ */
+ if (tcp_size >= sk->mss || (flags & MSG_OOB))
tcp_send_skb(sk, skb);
else
tcp_enqueue_partial(skb, sk);
@@ -1290,8 +1286,14 @@

delay = 0;
tmp = copy + sk->prot->max_header + 15;
- if (copy < sk->mss && !(flags & MSG_OOB) && sk->packets_out)
- {
+ /* If won't fill the current packet, and it's not an OOB message,
+ * then we might want to delay to allow data in the later parts
+ * of iov to fill this packet out. Note that if we aren't
+ * Nagling or there are no packets currently out then the top
+ * level code in tcp_sendmsg() will force any partial packets out
+ * after we finish building the largest packets this write allows.
+ */
+ if (copy < sk->mss && !(flags & MSG_OOB)) {
tmp = tmp - copy + sk->mtu + 128;
delay = 1;
}

--1053580162-1133330555-889166137=:17170
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="test-writev.c"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.3.96dg4.980305223537.17170d@twinlark.arctic.org>
Content-Description:

I2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3lzL3R5cGVzLmg+DQoj
aW5jbHVkZSA8c3lzL3NvY2tldC5oPg0KI2luY2x1ZGUgPG5ldGluZXQvaW4u
aD4NCiNpbmNsdWRlIDxzdGRsaWIuaD4NCiNpbmNsdWRlIDx1bmlzdGQuaD4N
CiNpbmNsdWRlIDxhcnBhL2luZXQuaD4NCiNpbmNsdWRlIDxzeXMvdWlvLmg+
DQojaW5jbHVkZSA8ZXJybm8uaD4NCg0KI2lmbmRlZiBJTkFERFJfTk9ORQ0K
I2RlZmluZSBJTkFERFJfTk9ORSAoLTF1bCkNCiNlbmRpZg0KDQp2b2lkIG1h
aW4oIGludCBhcmdjLCBjaGFyICoqYXJndiApDQp7DQogICAgc3RydWN0IHNv
Y2thZGRyX2luIHNlcnZlcl9hZGRyOw0KICAgIGludCBzOw0KICAgIHN0cnVj
dCBpb3ZlYyB2ZWN0b3JbM107DQogICAgY2hhciBidWZbMTAwXTsNCiAgICBp
bnQgaTsNCiAgICBjb25zdCBpbnQganVzdF9zYXlfbm8gPSAxOw0KDQogICAg
aWYoIGFyZ2MgIT0gMyApIHsNCnVzYWdlOg0KCWZwcmludGYoIHN0ZGVyciwg
InVzYWdlOiB0ZXN0LXdyaXRldiBhLmIuYy5kIHBvcnQjXG4iICk7DQoJZXhp
dCggMSApOw0KICAgIH0NCiAgICBzZXJ2ZXJfYWRkci5zaW5fZmFtaWx5ID0g
QUZfSU5FVDsNCiAgICBzZXJ2ZXJfYWRkci5zaW5fYWRkci5zX2FkZHIgPSBp
bmV0X2FkZHIoIGFyZ3ZbMV0gKTsNCiAgICBpZiggc2VydmVyX2FkZHIuc2lu
X2FkZHIuc19hZGRyID09IElOQUREUl9OT05FICkgew0KCWZwcmludGYoIHN0
ZGVyciwgImJvZ3VzIGFkZHJlc3NcbiIgKTsNCglnb3RvIHVzYWdlOw0KICAg
IH0NCiAgICBzZXJ2ZXJfYWRkci5zaW5fcG9ydCA9IGh0b25zKCBhdG9pKCBh
cmd2WzJdICkgKTsNCg0KICAgIHMgPSBzb2NrZXQoIEFGX0lORVQsIFNPQ0tf
U1RSRUFNLCAwICk7DQogICAgaWYoIHMgPCAwICkgew0KCXBlcnJvcigic29j
a2V0Iik7DQoJZXhpdCgxKTsNCiAgICB9DQogICAgaWYoIGNvbm5lY3QoIHMs
IChzdHJ1Y3Qgc29ja2FkZHIgKikmc2VydmVyX2FkZHIsIHNpemVvZiggc2Vy
dmVyX2FkZHIgKSApDQoJIT0gMCApIHsNCglwZXJyb3IoImNvbm5lY3QiKTsN
CglleGl0KDEpOw0KICAgIH0NCg0KICAgIGlmKCBzZXRzb2Nrb3B0KHMsIElQ
UFJPVE9fVENQLCBUQ1BfTk9ERUxBWSwgKGNoYXIqKSZqdXN0X3NheV9ubywN
CglzaXplb2YoanVzdF9zYXlfbm8pKSAhPSAwICkgew0KCXBlcnJvciggIlRD
UF9OT0RFTEFZIiApOw0KCWV4aXQoMSk7DQogICAgfQ0KICAgIC8qIG5vdyBi
dWlsZCB1cCBhIHR3byBwYXJ0IHdyaXRldiBhbmQgd3JpdGUgaXQgb3V0ICov
DQogICAgZm9yKCBpID0gMDsgaSA8IHNpemVvZiggYnVmICk7ICsraSApIHsN
CglidWZbaV0gPSAneCc7DQogICAgfQ0KICAgIHZlY3RvclswXS5pb3ZfYmFz
ZSA9IGJ1ZjsNCiAgICB2ZWN0b3JbMF0uaW92X2xlbiA9IHNpemVvZihidWYp
Ow0KICAgIHZlY3RvclsxXS5pb3ZfYmFzZSA9IGJ1ZjsNCiAgICB2ZWN0b3Jb
MV0uaW92X2xlbiA9IHNpemVvZihidWYpOw0KICAgIHZlY3RvclsyXS5pb3Zf
YmFzZSA9IGJ1ZjsNCiAgICB2ZWN0b3JbMl0uaW92X2xlbiA9IHNpemVvZihi
dWYpOw0KDQogICAgaSA9IHdyaXRldiggcywgJnZlY3RvclswXSwgMyApOw0K
ICAgIGZwcmludGYoIHN0ZG91dCwgImk9JWQsIGVycm5vPSVkXG4iLCBpLCBl
cnJubyApOw0KICAgIGV4aXQoMCk7DQp9DQo=
--1053580162-1133330555-889166137=:17170--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu