Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl

From: Leon Hwang

Date: Tue Mar 03 2026 - 03:58:57 EST

On 3/3/26 16:17, Eric Dumazet wrote:
> On Tue, Mar 3, 2026 at 8:55 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>>
>>
>>
>> On 3/3/26 14:26, Leon Hwang wrote:
>>>
>>>
>>> On 3/3/26 11:55, Eric Dumazet wrote:
>>>> On Tue, Mar 3, 2026 at 3:12 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 3/3/26 08:22, Jakub Kicinski wrote:
>>>>>> On Mon, 2 Mar 2026 17:55:59 +0800 Leon Hwang wrote:
>>>>>>> On 26/2/26 09:43, Jakub Kicinski wrote:
>>>>>>>> On Wed, 25 Feb 2026 15:46:33 +0800 Leon Hwang wrote:
>>>>>>>>> Issue:
>>>>>>>>> When a TCP socket in the CLOSE_WAIT state receives a RST packet, the
>>>>>>>>> current implementation does not clear the socket's receive queue. This
>>>>>>>>> causes SKBs in the queue to remain allocated until the socket is
>>>>>>>>> explicitly closed by the application. As a consequence:
>>>>>>>>>
>>>>>>>>> 1. The page pool pages held by these SKBs are not released.
>>>>>>>>
>>>>>>>> On what kernel version and driver are you observing this?
>>>>>>>
>>>>>>> # uname -r
>>>>>>> 6.19.0-061900-generic
>>>>>>>
>>>>>>> # ethtool -i eth0
>>>>>>> driver: mlx5_core
>>>>>>> version: 6.19.0-061900-generic
>>>>>>> firmware-version: 26.43.2566 (MT_0000000531)
>>>>>>
>>>>>> Okay... this kernel + driver should just patiently wait for the page
>>>>>> pool to go away.
>>>>>>
>>>>>> What is the actual, end user problem that you're trying to solve?
>>>>>> A few kB of data waiting to be freed is not a huge problem..
>>>>>
>>>>> Yes, it is not a huge problem.
>>>>>
>>>>> The actual end-user issue was discussed in
>>>>> "page_pool: Add page_pool_release_stalled tracepoint" [1].
>>>>>
>>>>> I think it would be useful to provide a way for SREs to purge the
>>>>> receive queue when CLOSE_WAIT TCP sockets receive RST packets. If the
>>>>> NIC, e.g., Mellanox, flaps, the underlying page pool and pages can be
>>>>> released at the same time.
>>>>>
>>>>> Links:
>>>>> [1]
>>>>> https://lore.kernel.org/netdev/b676baa0-2044-4a74-900d-f471620f2896@xxxxxxxxx/
>>>>
>>>> Perhaps SRE could use this in an emergency?
>>>>
>>>> ss -t -a state close-wait -K
>>>
>>> This ss command is acceptable in an emergency.
>>>
>>
>> However, once a CLOSE_WAIT TCP socket receives an RST packet, it
>> transitions to the CLOSE state. A socket in the CLOSE state cannot be
>> killed using the ss approach.
>>
>> The SKBs remain in the receive queue of the CLOSE socket until it is
>> closed by the user-space application.
>
> Why user-space application does not drain the receive queue ?
>
> Is there a missing EPOLLIN or something ?

The user-space application uses a TCP connection pool. It establishes
several TCP connections at startup and keeps them in the pool.

However, the application does not always drain their receive queues.
Instead, it selects one connection from the pool using a hash algorithm
for communication with the TCP server. When it attempts to write data
through a socket in the CLOSE state, it receives -EPIPE and then closes
it. As a result, TCP connections whose underlying socket state is CLOSE
may retain an SKB in their receive queues if they are not selected for
communication.

I proposed a solution to address this issue: close the TCP connection if
the underlying sk_err is non-zero.

Thanks,
Leon