[BUG] io_uring: possible CQE32 overflow flush inconsistency in __io_cqring_overflow_flush()
From: Cyber_black
Date: Fri Jun 19 2026 - 00:50:03 EST
Hi,
I believe there is a bug in __io_cqring_overflow_flush() in io_uring/io_uring.c
where `is_cqe32` and `cqe_size` are left in an inconsistent state when
IORING_SETUP_CQE32 is set, potentially leading to an out-of-bounds write into
the CQ ring.
AFFECTED FILE
=============
io_uring/io_uring.c
Function: __io_cqring_overflow_flush()
KERNEL VERSION
==============
Observed in current upstream (v6.8+). Please confirm against your tree.
Found File
==============
https://github.com/torvalds/linux/blob/master/io_uring/io_uring.c
DESCRIPTION
===========
Inside the flush loop, `cqe_size` and `is_cqe32` are both initialized and then
conditionally updated:
size_t cqe_size = sizeof(struct io_uring_cqe); /* 16 bytes /
bool is_cqe32 = false;
/ Block A */
if (ocqe->cqe.flags & IORING_CQE_F_32 ||
ctx->flags & IORING_SETUP_CQE32) {
is_cqe32 = true;
cqe_size <<= 1; /* cqe_size = 32 bytes /
}
/ Block B */
if (ctx->flags & IORING_SETUP_CQE32)
is_cqe32 = false; /* only is_cqe32 reset, cqe_size NOT reset */
if (!dying) {
if (!io_get_cqe_overflow(ctx, &cqe, true, is_cqe32))
break;
memcpy(cqe, &ocqe->cqe, cqe_size);
}
When IORING_SETUP_CQE32 is set, Block A correctly doubles cqe_size to 32 and
sets is_cqe32 = true. Block B then resets is_cqe32 back to false, but leaves
cqe_size at 32.
This means:
- io_get_cqe_overflow() is called with is_cqe32 = false
→ it returns a pointer to a 16-byte CQE slot in the ring
- memcpy() then copies cqe_size = 32 bytes into that 16-byte slot
→ 16 bytes past the end of the allocated CQE slot are overwritten
The destination `cqe` points directly into the shared CQ ring memory
(ctx->rings->cqes[]), so the excess bytes corrupt the adjacent CQE entry.
If the corrupted slot is the last one in the ring, the overflow writes past
the array and corrupts other fields in struct io_rings (e.g., sq_flags, cq_flags).
IMPACT
======
On a ring configured with IORING_SETUP_CQE32, flushing the overflow list
causes silent corruption of adjacent CQE entries (or adjacent ring metadata).
This can manifest as:
- Userspace receiving garbled CQE data (wrong user_data, res, flags)
- Link chains or multishot requests making decisions based on corrupt
completions
- Unpredictable kernel behavior if ring metadata is overwritten
- Potential data integrity issues in applications relying on io_uring with CQE32
STEPS TO REPRODUCE
==================
1. Create an io_uring instance with IORING_SETUP_CQE32.
2. Submit enough requests to fill the CQ ring and trigger overflow
(i.e., force entries onto ctx->cq_overflow_list).
3. Call io_uring_enter() or close the ring to trigger
__io_cqring_overflow_flush().
4. Observe that the CQE following the flushed entry (or ring metadata) is
silently overwritten. This can be verified by reading the CQ ring from
userspace.
SUSPECTED ROOT CAUSE
====================
Block B appears to have been added to handle IORING_SETUP_CQE_MIXED, where the
ctx-level CQE32 flag should not be passed down to io_get_cqe_overflow() (since
in mixed mode the slot size is determined per-entry by the flag, not globally).
However, Block B resets only is_cqe32 and not cqe_size, creating the
inconsistency.
PROPOSED FIX
============
If Block B is intentional (i.e. io_get_cqe_overflow already handles CQE32 slot
sizing internally when IORING_SETUP_CQE32 is set), then cqe_size must also be
reset:
if (ctx->flags & IORING_SETUP_CQE32) {
is_cqe32 = false;
cqe_size = sizeof(struct io_uring_cqe); /* undo Block A */
}
Alternatively, if Block B is dead/incorrect code, it should be removed entirely
and io_get_cqe_overflow() called with is_cqe32 = true when appropriate.
The correct fix depends on the intended semantics of is_cqe32 vs ctx flag
inside io_get_cqe_overflow(), which the maintainer is best placed to confirm.
RELEVANT CODE (verbatim)
========================
--- a/io_uring/io_uring.c (v6.8)
__io_cqring_overflow_flush(), lines ~541-552:
if (ocqe->cqe.flags & IORING_CQE_F_32 ||
ctx->flags & IORING_SETUP_CQE32) {
is_cqe32 = true;
cqe_size <<= 1;
}
if (ctx->flags & IORING_SETUP_CQE32)
is_cqe32 = false; /* BUG: cqe_size not restored /
if (!dying) {
if (!io_get_cqe_overflow(ctx, &cqe, true, is_cqe32))
break;
memcpy(cqe, &ocqe->cqe, cqe_size); / OOB if slot < cqe_size */
}
Thanks for looking into this.
Best Regards
Eneshan Erdoğan Karaca.