Re: [PATCH] io_uring/net: don't fail linked ops when done_io > 0

From: Hannes Furmans

Date: Fri Feb 27 2026 - 11:25:40 EST


Hi Stefan,

Am 27.02.26 um 14:59 schrieb Stefan Metzmacher:
> That's by design, if a MSG_WAITALL calls fails it means
> not call data the caller expected arrived or were sent.
> When there's a LINK after that the linked operation likely
> relies on all expected data being processed! Otherwise
> the message stream can get out of sync and causes corruption.

You're right — a short MSG_WAITALL read should sever the IO_LINK
chain. The v1 patch was wrong to guard req_set_fail() on done_io > 0.

> Let's assume I want to send a message header with
> IO_SEND linked with a IO_SPLICE to send the payload.
>
> If IO_SEND returns short the situation needs to be
> recovered by the caller instead of letting the
> IO_SPLICE give more data to the socket.

Agreed, the linked operation expects the complete data.

> So the current behavior is exactly what MSG_WAITALL
> gives you. If you don't want that why are you using it
> at all?

The actual bug is narrower. I traced the root cause with kTLS.

When IORING_OP_RECV is used with MSG_WAITALL on a kTLS socket,
the recv completes successfully (ret >= min_ret, full requested
amount received). But kTLS calls put_cmsg(SOL_TLS,
TLS_GET_RECORD_TYPE) for every first record of a recvmsg call
(tls_sw.c:1843). Since io_recv sets up the msghdr with
msg_control=NULL and msg_controllen=0, put_cmsg sets MSG_CTRUNC.

Then io_recv hits the else-if branch:

} else if ((flags & MSG_WAITALL) &&
(msg_flags & (MSG_TRUNC | MSG_CTRUNC))) {
req_set_fail(req);
}

This sets REQ_F_FAIL on a fully successful recv. The CQE shows
the full byte count, but the linked write gets -ECANCELED.

I confirmed this with ftrace — the recv completes with
result=67108864 (exactly 64MB requested), then
io_uring_fail_link fires immediately after from an io-wq worker.
I also confirmed with a plain recvmsg debug tool that kTLS
returns msg_flags=0x88 (MSG_EOR | MSG_CTRUNC) on every call.

Your commit 0031275d119e says "For IORING_OP_RECVMSG we also
check for the MSG_TRUNC and MSG_CTRUNC flags" but the code
applies the check to IORING_OP_RECV as well. MSG_CTRUNC is
meaningful for IORING_OP_RECVMSG (user provides a cmsg buffer).
It's meaningless for IORING_OP_RECV which never has a cmsg
buffer.

I'll send a v2 that only removes MSG_CTRUNC from the io_recv
check.

Thanks,
Hannes