[PATCH v2] ceph: fix multiple unsafe decodes in decode_locker()

From: Pavitra Jha

Date: Thu May 28 2026 - 09:02:04 EST


decode_locker() in cls_lock_client.c contains three unsafe decode
operations that allow a malicious or compromised OSD to trigger
slab-out-of-bounds reads:

1. ceph_decode_copy() at the locker_id_t name field has no preceding
bounds check. With p == end after ceph_start_decoding() accepts
struct_len=0, this reads sizeof(ceph_entity_name) = 9 bytes past
the validated buffer boundary.

2. *p += sizeof(struct ceph_timespec) after the locker_info_t header
is an unchecked pointer advance. A malicious OSD can position p
past end, causing all subsequent _safe checks to pass against a
bogus boundary.

3. len = ceph_decode_32(p) has no preceding bounds check, and the
immediately following *p += len is uncapped. A malicious OSD can
send len=0xffffffff, advancing p gigabytes past end and escaping
the decode window entirely.

Fix all three by replacing bare operations with their safe variants:
ceph_decode_copy -> ceph_decode_copy_safe
*p += sizeof(...) -> ceph_decode_skip_n
ceph_decode_32(p) -> ceph_decode_32_safe
*p += len -> ceph_decode_skip_n

A new out_bad: label is added to return -EINVAL on any bounds
violation. -EINVAL is appropriate here: the data received from the OSD
is structurally malformed, which is an invalid argument to the decode
contract regardless of whether the caller or the wire is at fault.

KASAN report (kernel 7.0.0-rc7, QEMU/x86_64, KASLR disabled):

[ 26.183969] ceph_oob4_poc: buf=ffff888009e31000 end=ffff888009e31fa0
[ 26.186087] ceph_oob4_poc: struct_v=1 struct_len=0 p==end: 1
[ 26.186738] ceph_oob4_poc: triggering bare ceph_decode_32 past slab boundary...
[ 26.187679] ==================================================================
[ 26.188236] BUG: KASAN: slab-out-of-bounds in ceph_oob4_init+0x22b/0xff0 [ceph_oob4_poc]
[ 26.188236] Read of size 4 at addr ffff888009e31fa0 by task insmod/59
[ 26.188236] CPU: 0 UID: 0 PID: 59 Comm: insmod Tainted: G O 7.0.0-rc7-g9c2abf69da83-dirty #15 PREEMPT(lazy)
[ 26.188236] Call Trace:
[ 26.188236] <TASK>
[ 26.188236] dump_stack_lvl+0x4d/0x70
[ 26.188236] print_report+0x170/0x4f3
[ 26.188236] kasan_report+0xda/0x110
[ 26.188236] ceph_oob4_init+0x22b/0xff0 [ceph_oob4_poc]
[ 26.188236] do_one_initcall+0x9a/0x3a0
[ 26.188236] do_init_module+0x27c/0x790
[ 26.188236] load_module+0x4a9a/0x6350
[ 26.188236] init_module_from_file+0x15c/0x180
[ 26.188236] idempotent_init_module+0x21f/0x750
[ 26.188236] __x64_sys_finit_module+0xba/0x120
[ 26.188236] do_syscall_64+0xe2/0x570
[ 26.188236] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 26.188236] </TASK>
[ 26.188236] The buggy address belongs to the object at ffff888009e31000
[ 26.188236] which belongs to the cache kmalloc-4k of size 4096
[ 26.188236] The buggy address is located 0 bytes to the right of
[ 26.188236] allocated 4000-byte region [ffff888009e31000, ffff888009e31fa0)
[ 26.188236] ffff888009e31f80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
[ 26.188236] ^
[ 26.188236] ffff888009e32000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 26.188236] ==================================================================
[ 26.255513] ceph_oob4_poc: len=0xcccccccc (OOB garbage from KASAN redzone)

0xCCCCCCCC is KASAN redzone poison, confirming the read landed in
the slab redzone immediately past the 4000-byte allocation.

Attacker model: a malicious or compromised OSD in a multi-tenant Ceph
deployment can trigger this against any kernel client that issues the
lock.get_info class method (e.g. during RBD exclusive lock acquisition)
without any further privileges beyond OSD session establishment.

Fixes: d4ed4a530562 ("libceph: support for lock.lock_info")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Pavitra Jha <jhapavitra98@xxxxxxxxx>
---
v2: Move inline comments above ceph_decode_skip_n calls to stay within
the 80-column limit, and rename label bad -> out_bad, per
Viacheslav Dubeyko's review.
---
net/ceph/cls_lock_client.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ceph/cls_lock_client.c b/net/ceph/cls_lock_client.c
index 78276273c..4f27b3d15 100644
--- a/net/ceph/cls_lock_client.c
+++ b/net/ceph/cls_lock_client.c
@@ -259,7 +259,7 @@ static int decode_locker(void **p, void *end, struct ceph_locker *locker)
if (ret)
return ret;

- ceph_decode_copy(p, &locker->id.name, sizeof(locker->id.name));
+ ceph_decode_copy_safe(p, end, &locker->id.name, sizeof(locker->id.name), out_bad);
s = ceph_extract_encoded_string(p, end, NULL, GFP_NOIO);
if (IS_ERR(s))
return PTR_ERR(s);
@@ -270,19 +270,23 @@ static int decode_locker(void **p, void *end, struct ceph_locker *locker)
if (ret)
return ret;

- *p += sizeof(struct ceph_timespec); /* skip expiration */
+ /* skip expiration */
+ ceph_decode_skip_n(p, end, sizeof(struct ceph_timespec), out_bad);

ret = ceph_decode_entity_addr(p, end, &locker->info.addr);
if (ret)
return ret;

- len = ceph_decode_32(p);
- *p += len; /* skip description */
+ ceph_decode_32_safe(p, end, len, out_bad);
+ /* skip description */
+ ceph_decode_skip_n(p, end, len, out_bad);

dout("%s %s%llu cookie %s addr %s\n", __func__,
ENTITY_NAME(locker->id.name), locker->id.cookie,
ceph_pr_addr(&locker->info.addr));
return 0;
+out_bad:
+ return -EINVAL;
}

static int decode_lockers(void **p, void *end, u8 *type, char **tag,
--
2.53.0