[PATCH] RDMA/rxe: reject non-8-byte ATOMIC_WRITE payloads

From: Michael Bommarito

Date: Sat Apr 18 2026 - 12:23:48 EST


atomic_write_reply() at drivers/infiniband/sw/rxe/rxe_resp.c
unconditionally dereferences 8 bytes at payload_addr(pkt):

value = *(u64 *)payload_addr(pkt);

check_rkey() previously accepted an ATOMIC_WRITE request with
pktlen == resid == 0 because the length validation only compared
pktlen against resid. A remote initiator that sets the RETH
length to 0 therefore reaches atomic_write_reply() with a
zero-byte logical payload, and the responder reads sizeof(u64)
bytes from past the logical end of the packet into skb->head
tailroom, then writes those 8 bytes into the attacker's MR via
rxe_mr_do_atomic_write(). That is a remote disclosure of 4 bytes
of kernel tailroom per probe (the other 4 bytes are the packet's
own trailing ICRC).

IBA oA19-28 defines ATOMIC_WRITE as exactly 8 bytes. Anything
else is protocol-invalid. Hoist a strict length check into
check_rkey() so the responder never reaches the unchecked
dereference, and keep the existing WRITE-family length logic for
the normal RDMA WRITE path.

Reproduced on mainline with an unmodified rxe driver: a
sustained zero-length ATOMIC_WRITE probe repeatedly leaks
adjacent skb head-buffer bytes into the attacker's MR,
including recognisable kernel strings and partial
kernel-direct-map pointer words. With this patch applied the
responder rejects the PDU and the MR stays all-zero.

Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service")
Cc: stable@xxxxxxxxxxxxxxx
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@xxxxxxxxx>
---
Previously reported to security@ (2026-04-18); reposting
publicly at the maintainer's request.

Per-probe evidence from a 100K-packet run on the clean
unpatched tree at 9ca18fc915c6 (single attacker QP against a
hairpin target QP over a veth pair; each probe one crafted
zero-length ATOMIC_WRITE PDU):

transmitted packets: 100,000
observed MR writes: 48,575
non-zero leaked tails: 33,297 (68.55% of observed writes)
mostly-printable tails: 3,796 (7.81%)
fully-printable tails: 2,241 (4.61%)
unique non-zero tails: 22,220

Each probe is a fresh skb head-buffer allocation, so the 4
attacker-visible bytes after the ICRC are an independent
sample of slab-adjacent memory. Content distribution across
the 48,575 observed writes: 31.45% zero, 4.61% fully
printable, 3.20% mostly printable, 12.06% header/sentinel-
looking (08004500, 08004508, ffffffff, ...), 48.68% other
binary. 80.9% of unique non-zero tails were singletons, so
the leak is not dominated by one repeated value.

Representative printable fragments observed on the attacker
side:

74 6f 70 2e "top."
66 72 65 65 "free"
45 78 65 63 "Exec"
2f 73 79 73 "/sys"
72 6f 6f 74 "root"
45 56 50 41 "EVPA"
43 4f 44 45 "CODE"

Partial pointer-like recoveries (4-byte words ending in the
kernel-direct-map prefix 0xffff....):

3,361 observations ending in ffff
1,364 unique ....ffff tails
most common:
81 88 ff ff LE 0xffff8881 1.68% of observed writes
80 88 ff ff LE 0xffff8880 0.22%

The run did not recover full 64-bit kernel virtual addresses
(only 4 bytes per probe are attacker-observable), but the
partial pointer material is consistent with a KASLR-weakening
primitive under sustained probing. With the fix applied, the
same harness leaves the attacker MR all-zero.
---
drivers/infiniband/sw/rxe/rxe_resp.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 711f73e0bbb1..09ba21d0f3c4 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -526,7 +526,19 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
}

skip_check_range:
- if (pkt->mask & (RXE_WRITE_MASK | RXE_ATOMIC_WRITE_MASK)) {
+ if (pkt->mask & RXE_ATOMIC_WRITE_MASK) {
+ /* IBA oA19-28: ATOMIC_WRITE payload is exactly 8 bytes.
+ * Reject any other length before the responder reads
+ * sizeof(u64) bytes from payload_addr(pkt); a shorter
+ * payload would read past the logical end of the packet
+ * into skb->head tailroom.
+ */
+ if (resid != sizeof(u64) || pktlen != sizeof(u64) ||
+ bth_pad(pkt)) {
+ state = RESPST_ERR_LENGTH;
+ goto err;
+ }
+ } else if (pkt->mask & RXE_WRITE_MASK) {
if (resid > mtu) {
if (pktlen != mtu || bth_pad(pkt)) {
state = RESPST_ERR_LENGTH;
--
2.53.0