[patch] TCP/IP delacks disabled with MPI

Andrea Arcangeli (andrea@suse.de)
Fri, 21 May 1999 03:34:45 +0200 (CEST)


Due some email I received in the last days I ported a my TCP patch against
2.2.2 to 2.2.9.

A bit of story: some month ago in order to improve responsiveness of MPI
applications run over TCP/IP I had the idea to disable dealayed acks if
TCP_NODELAY is been requested in the socket options.

Here it is the old 2.2.2 patch ported to 2.2.9:

Index: linux/net/ipv4//tcp_input.c
===================================================================
RCS file: /var/cvs/linux/net/ipv4/tcp_input.c,v
retrieving revision 1.1.1.10
diff -u -r1.1.1.10 tcp_input.c
--- linux/net/ipv4//tcp_input.c 1999/05/12 11:37:05 1.1.1.10
+++ linux/net/ipv4//tcp_input.c 1999/05/21 00:04:11
@@ -1585,7 +1585,9 @@
/* We entered "quick ACK" mode or... */
tcp_in_quickack_mode(tp) ||
/* We have out of order data */
- (skb_peek(&tp->out_of_order_queue) != NULL)) {
+ !skb_queue_empty(&tp->out_of_order_queue) ||
+ /* TCP_NODELAY is set, this improves MPI performances. -Andrea */
+ sk->nonagle == 1) {
/* Then ack it now */
tcp_send_ack(sk);
} else {

But now I had a new idea:

One of the thing I noticed from the tcp traces sent to me from Josip, is
that currently we send the ack only if ATO expires or we received more
than 2*MSS data but we don't care about the exact _number_ of not acked
packets we have in the receive queue. If I remeber well MPI packets are
going to be really tiny (some byte each) so before getting an ack there
was _tons_ of incoming tiny-grams packets.

I don't rember if tcp-RFCs says that we must force an ack at least after
we received 2*MSS or after _two_ random-sized frames. If I remeber well
last time I checked it was not clear to me (hints?).

Anyway I think that my new patch below can be interesting too. This new
one will allow us to continue to merge the ack in the next send (because
it won't kill delayed acks), but will allow us to be very more responsive
with TCP_NODELAY (similarly to the old patch above, even if less
agressive, but being less aggressive may improve performances avoiding
many ack-alone packets to flow on the network). I would like if people
would make comparison between the old brute-patch, this new nicer patch.
Thanks!

(patch against 2.3.3 but should apply cleanly also against 2.2.9)

Index: linux/net/ipv4/tcp_input.c
===================================================================
RCS file: /var/cvs/linux/net/ipv4/tcp_input.c,v
retrieving revision 1.1.1.11
diff -u -r1.1.1.11 tcp_input.c
--- linux/net/ipv4/tcp_input.c 1999/05/16 20:56:05 1.1.1.11
+++ linux/net/ipv4/tcp_input.c 1999/05/21 01:26:02
@@ -1575,6 +1591,25 @@
}
}

+static __inline__ int tcp_nr_packets_not_acked(int nr, struct sock * sk,
+ struct tcp_opt * tp)
+{
+ int __nr = 0;
+ struct sk_buff * skb = (struct sk_buff *) &sk->receive_queue;
+
+ while ((skb = skb->prev) != (struct sk_buff *) &sk->receive_queue)
+ {
+ if (TCP_SKB_CB(skb)->end_seq > tp->last_ack_sent)
+ {
+ if (++__nr >= nr)
+ return 1;
+ }
+ else
+ break;
+ }
+ return 0;
+}
+
/*
* Check if sending an ack is needed.
*/
@@ -1603,7 +1638,14 @@
/* We entered "quick ACK" mode or... */
tcp_in_quickack_mode(tp) ||
/* We have out of order data */
- (skb_peek(&tp->out_of_order_queue) != NULL)) {
+ !skb_queue_empty(&tp->out_of_order_queue) ||
+ /*
+ * With TCP_NODELAY we allow at most two not acked packet to
+ * stay in the receive queue to allow better responsiveness
+ * without losing the nice delayed ack feature of merging the
+ * ack in the next packet sent. -Andrea
+ */
+ (sk->nonagle == 1 && tcp_nr_packets_not_acked(2, sk, tp))) {
/* Then ack it now */
tcp_send_ack(sk);
} else {

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/