Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends

From: Jakub Kicinski

Date: Sat Feb 28 2026 - 12:16:01 EST


On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> tx_work_handler calls tls_tx_records with flags=-1, which preserves
> each record's original tx_flags but results in tcp_sendmsg_locked
> using an infinite send timeout. When the peer is unresponsive and the
> send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> cancel_delayed_work_sync waiting for tx_work_handler to finish,
> leading to a hung task:
>
> INFO: task ...: blocked for more than ... seconds.
> Call Trace:
> cancel_delayed_work_sync
> tls_sw_cancel_work_tx
> tls_sk_proto_close
>
> A workqueue handler should never block indefinitely. Fix this by
> introducing __tls_tx_records() with an extra_flags parameter that
> gets OR'd into each record's tx_flags. tx_work_handler uses this to
> pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> when the send buffer is full, without overwriting the original
> per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> existing reschedule mechanism retries after a short delay.
>
> Also consolidate the two identical reschedule paths (lock contention
> and -EAGAIN) into one.

It's not that simple. The default semantics for TCP sockets is that
queuing data and then calling close() is a legitimate thing to do
and the data should be sent cleanly, followed by a normal FIN in such
case.

Maybe we should explore trying to make sure we have enough wmem before
we start creating records. Get rid of the entire workqueue mess?

Regarding your patch I think all callers passing -1 as flags are on
the close path, you could have just added | DONTWAIT if the flags
are -1.
--
pw-bot: cr