Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer()

From: Menglong Dong
Date: Thu Jul 27 2023 - 22:57:46 EST


On Fri, Jul 28, 2023 at 3:31 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Thu, Jul 27, 2023 at 2:52 PM <menglong8.dong@xxxxxxxxx> wrote:
> >
> > From: Menglong Dong <imagedong@xxxxxxxxxxx>
> >
> > In tcp_retransmit_timer(), a window shrunk connection will be regarded
> > as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
> > right all the time.
> >
> > The retransmits will become zero-window probes in tcp_retransmit_timer()
> > if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
> > TCP_RTO_MAX sooner or later.
> >
> > However, the timer is not precise enough, as it base on timer wheel.
> > Sorry that I am not good at timer, but I know the concept of time-wheel.
> > The longer of the timer, the rougher it will be. So the timeout is not
> > triggered after TCP_RTO_MAX, but 122877ms as I tested.
> >
> > Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
> > once the RTO come up to TCP_RTO_MAX.
> >
> > Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
> > which is exact the timestamp of the timeout.
> >
> > Signed-off-by: Menglong Dong <imagedong@xxxxxxxxxxx>
> > ---
> > net/ipv4/tcp_timer.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> > index 470f581eedd4..3a20db15a186 100644
> > --- a/net/ipv4/tcp_timer.c
> > +++ b/net/ipv4/tcp_timer.c
> > @@ -511,7 +511,11 @@ void tcp_retransmit_timer(struct sock *sk)
> > tp->snd_una, tp->snd_nxt);
> > }
> > #endif
> > - if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) {
> > + /* It's a little rough here, we regard any valid packet that
> > + * update tp->rcv_tstamp as the reply of the retransmitted
> > + * packet.
> > + */
> > + if ((u32)icsk->icsk_timeout - tp->rcv_tstamp > TCP_RTO_MAX) {
> > tcp_write_err(sk);
> > goto out;
> > }
>
>
> Hmm, this looks like a net candidate, since this is unrelated to the
> other patches ?

Yeah, this patch can be standalone. However, considering the
purpose of this series, it is necessary. Without this patch, the
OOM probe will always timeout after a few minutes.

I'm not sure if I express the problem clearly in the commit log.
Let's explain it more.

Let's mark the timestamp of the 10th timeout of the rtx timer
as TS1. Now, the retransmission happens and the ACK of
the retransmitted packet will update the tp->rcv_tstamp to
TS1+rtt.

The RTO now is TCP_RTO_MAX. So let's see what will
happen in the 11th timeout. As we timeout after 122877ms,
so tcp_jiffies32 now is "TS1+122877ms", and
"tcp_jiffies32 - tp->rcv_tstamp" is
"TS1+122877ms - (TS1+rtt)" -> "122877ms - rtt",
which is always bigger than TCP_RTO_MAX, which is 120000ms.

>
> Neal, what do you think ?