Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl

From: Leon Hwang

Date: Wed Feb 25 2026 - 04:48:56 EST




On 25/2/26 16:31, Eric Dumazet wrote:
> On Wed, Feb 25, 2026 at 8:46 AM Leon Hwang <leon.huangfu@xxxxxxxxxx> wrote:
>>
>> Introduce a new sysctl knob, net.ipv4.tcp_purge_receive_queue, to
>> address a memory leak scenario related to TCP sockets.
>
> We use the term "memory leak" for a persistent loss of memory (until reboot)
>

Thanks for the clarification.

> Lets not abuse it and confuse various AI/human agents which will
> declare emergency situations
> caused by an inexistent fatal error.
>

I'll reword it in the next revision.

>>
>> Issue:
>> When a TCP socket in the CLOSE_WAIT state receives a RST packet, the
>> current implementation does not clear the socket's receive queue. This
>> causes SKBs in the queue to remain allocated until the socket is
>> explicitly closed by the application. As a consequence:
>>
>> 1. The page pool pages held by these SKBs are not released.
>
> This situation also applies for normal TCP_ESTABLISHED sockets, when
> applications
> do not drain the receive queue.
>
> As long the application has not called close(), kernel should not
> assume the application
> will _not_ read the data that was received.
>

Understood.

This patch provides an option to drain the receive queue in the
CLOSE_WAIT + RST case, instead of purging it unconditionally upon
receiving a RST packet.

>
>> 2. The associated page pool cannot be freed.
>>
>> RFC 9293 Section 3.10.7.4 specifies that when a RST is received in
>> CLOSE_WAIT state, "all segment queues should be flushed." However, the
>> current implementation does not flush the receive queue.
>
> Some buggy stacks send RST anyway after FIN. I think that forcingly
> purging good data
> received before the RST would add many surprises.
>

Understood.

There is a tcp_write_queue_purge(sk) call in tcp_done_with_error(),
which means sk_write_queue is always purged when a RST packet is
received. I assume the reason for purging sk_write_queue is that any
pending transmissions become meaningless once a RST is received.

Would it be better to defer kb_queue_purge(&sk->sk_receive_queue) until
after tcp_done_with_error()?

[...]

>>
>
> Please prepare a packetdrill test.

Ack.

I'll add a packetdrill test in the next revision.

Thanks,
Leon