[PATCH net-next 10/10] rxrpc: Transmit more ACKs during data reception

From: David Howells
Date: Wed Aug 01 2018 - 08:59:55 EST


Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() -
which transfers data to the target buffers - if we previously had an Rx
underrun (ie. we returned -EAGAIN because we ran out of received data).
This lets the server know what we've managed to receive something.

Also flush any outstanding ACK after calling the function if it hit -EAGAIN
to let the server know we processed some data.

It might be better to send more ACKs, possibly on a time-based scheme, but
that needs some more consideration.

With this and some additional AFS patches, it is possible to get large
unencrypted O_DIRECT reads to be almost as fast as NFS over TCP. It looks
like it might be theoretically possible to improve performance yet more for
a server running a single operation as investigation of packet timestamps
indicates that the server keeps stalling.

The issue appears to be that rxrpc runs in to trouble with ACK packets
getting batched together (up to ~32 at a time) somewhere between the IP
transmit queue on the client and the ethernet receive queue on the server.

However, this case isn't too much of a worry as even a lightly loaded
server should be receiving sufficient packet flux to flush the ACK packets
to the UDP socket.

Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
---

net/rxrpc/ar-internal.h | 1 +
net/rxrpc/recvmsg.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e791d35ee34b..9d9278a13d91 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -479,6 +479,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout occurred */
RXRPC_CALL_BEGAN_RX_TIMER, /* We began the expect_rx_by timer */
RXRPC_CALL_RX_HEARD, /* The peer responded at least once to this call */
+ RXRPC_CALL_RX_UNDERRUN, /* Got data underrun */
};

/*
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 02f1f768e16a..a57ea96c84ea 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -313,6 +313,10 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
unsigned int rx_pkt_offset, rx_pkt_len;
int ix, copy, ret = -EAGAIN, ret2;

+ if (test_and_clear_bit(RXRPC_CALL_RX_UNDERRUN, &call->flags) &&
+ call->ackr_reason)
+ rxrpc_send_ack_packet(call, false, NULL);
+
rx_pkt_offset = call->rx_pkt_offset;
rx_pkt_len = call->rx_pkt_len;

@@ -412,6 +416,8 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
done:
trace_rxrpc_recvmsg(call, rxrpc_recvmsg_data_return, seq,
rx_pkt_offset, rx_pkt_len, ret);
+ if (ret == -EAGAIN)
+ set_bit(RXRPC_CALL_RX_UNDERRUN, &call->flags);
return ret;
}

@@ -684,6 +690,17 @@ int rxrpc_kernel_recv_data(struct socket *sock, struct rxrpc_call *call,
read_phase_complete:
ret = 1;
out:
+ switch (call->ackr_reason) {
+ case RXRPC_ACK_IDLE:
+ break;
+ case RXRPC_ACK_DELAY:
+ if (ret != -EAGAIN)
+ break;
+ /* Fall through */
+ default:
+ rxrpc_send_ack_packet(call, false, NULL);
+ }
+
if (_service)
*_service = call->service_id;
mutex_unlock(&call->user_mutex);