Re: [PATCH net-next 3/3] net: tcp: handle window shrink properly

From: Menglong Dong
Date: Wed May 24 2023 - 08:16:23 EST


On Tue, May 23, 2023 at 9:27 PM Neal Cardwell <ncardwell@xxxxxxxxxx> wrote:
> > Oh, I understand what you mean now. You are saying that
> > retransmit that first packet in the retransmit queue instead
> > of zero-window probe packet when OOM of the receiver,
> > isn't it? In other word, retransmit the unacked data and ignore
> > the tcp_retries2 when we find the receiver is in OOM state.
>
> Yes. The idea would be to use a heuristic to estimate the receiver is
> currently OOM and use ICSK_TIME_PROBE0 / tcp_probe_timer() /
> tcp_write_wakeup() in this case instead of ICSK_TIME_RETRANS /
> tcp_retransmit_timer().
>

Well, I think that maybe we should use ICSK_TIME_PROBE0 /
tcp_probe_timer() / tcp_retransmit_skb(), isn't it?

What tcp_write_wakeup() does is send new data if the receive
window available, which means push new data into the retransmit
queue. However, what we need to do now is retransmit the first
packet in the rtx queue, isn't it?

In the tcp_ack(), we estimate that if the receiver is OOM and
mark the sk with OOM state, and raise ICSK_TIME_PROBE0.
When new data is acked, we leave the OOM state.

The OOM state can only happen when the rtx queue is not empty,
otherwise the tcp connection will enter normal zero-window probe
state. So when the timeout of ICSK_TIME_PROBE0, we need
retransmit the skb in the rtx queue.

tcp_write_wakeup() don't do the job the retransmit packet, but
send new data.

Am I right?

Thanks!
Menglong Dong

> > That's an option, and we can make the length of the data we
> > send to 1 byte, which means we keep retransmitting the first
> > byte that has not be acked in the retransmit queue.
>
> I don't think it would be worth adding special-case code to only send
> 1 byte. If the data receiver is not OOM then for CPU and memory
> efficiency reasons (as well as simplicity) the data sender should send
> it a full MSS. So for those reasons I would suggest that in this
> potential approach tcp_write_wakeup() should stay the same.
>
> neal