Re: [PATCH net-next 2/3] net: tcp: send zero-window when no memory

From: Eric Dumazet
Date: Wed May 17 2023 - 10:45:16 EST


On Wed, May 17, 2023 at 2:42 PM <menglong8.dong@xxxxxxxxx> wrote:
>
> From: Menglong Dong <imagedong@xxxxxxxxxxx>
>
> For now, skb will be dropped when no memory, which makes client keep
> retrans util timeout and it's not friendly to the users.

Yes, networking needs memory. Trying to deny it is recipe for OOM.

>
> Therefore, now we force to receive one packet on current socket when
> the protocol memory is out of the limitation. Then, this socket will
> stay in 'no mem' status, util protocol memory is available.
>

I think you missed one old patch.

commit ba3bb0e76ccd464bb66665a1941fabe55dadb3ba tcp: fix
SO_RCVLOWAT possible hangs under high mem pressure



> When a socket is in 'no mem' status, it's receive window will become
> 0, which means window shrink happens. And the sender need to handle
> such window shrink properly, which is done in the next commit.
>
> Signed-off-by: Menglong Dong <imagedong@xxxxxxxxxxx>
> ---
> include/net/sock.h | 1 +
> net/ipv4/tcp_input.c | 12 ++++++++++++
> net/ipv4/tcp_output.c | 7 +++++++
> 3 files changed, 20 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 5edf0038867c..90db8a1d7f31 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -957,6 +957,7 @@ enum sock_flags {
> SOCK_XDP, /* XDP is attached */
> SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
> SOCK_RCVMARK, /* Receive SO_MARK ancillary data with packet */
> + SOCK_NO_MEM, /* protocol memory limitation happened */
> };
>
> #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index a057330d6f59..56e395cb4554 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5047,10 +5047,22 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
> if (skb_queue_len(&sk->sk_receive_queue) == 0)
> sk_forced_mem_schedule(sk, skb->truesize);

I think you missed this part : We accept at least one packet,
regardless of memory pressure,
if the queue is empty.

So your changelog is misleading.

> else if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) {
> + if (sysctl_tcp_wnd_shrink)

We no longer add global sysctls for TCP. All new sysctls must per net-ns.

> + goto do_wnd_shrink;
> +
> reason = SKB_DROP_REASON_PROTO_MEM;
> NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP);
> sk->sk_data_ready(sk);
> goto drop;
> +do_wnd_shrink:
> + if (sock_flag(sk, SOCK_NO_MEM)) {
> + NET_INC_STATS(sock_net(sk),
> + LINUX_MIB_TCPRCVQDROP);
> + sk->sk_data_ready(sk);
> + goto out_of_window;
> + }
> + sk_forced_mem_schedule(sk, skb->truesize);

So now we would accept two packets per TCP socket, and yet EPOLLIN
will not be sent in time ?

packets can consume about 45*4K each, I do not think it is wise to
double receive queue sizes.

What you want instead is simply to send EPOLLIN sooner (when the first
packet is queued instead when the second packet is dropped)
by changing sk_forced_mem_schedule() a bit.

This might matter for applications using SO_RCVLOWAT, but not for
other applications.