Re: [PATCH 4/4] ceph: cap delegated inode count in ceph_parse_deleg_inos()
From: Viacheslav Dubeyko
Date: Thu Jun 04 2026 - 17:06:25 EST
On Thu, 2026-06-04 at 14:09 -0400, Michael Bommarito wrote:
> ceph_parse_deleg_inos() decodes interval sets of delegated inode
> numbers
> from an MDS create-with-delegation reply. For each set it reads a 64-
> bit
> start and a 64-bit len with ceph_decode_64_safe(), which only
> validates
> that the eight bytes are present in the message, not the value, and
> then
> loops:
>
> while (len--) {
> xa_insert(&s->s_delegated_inos, start++,
> DELEGATED_INO_AVAILABLE, GFP_KERNEL);
> ...
> }
>
> len is fully attacker controlled. A malicious or compromised MDS can
> send
> a create reply whose interval set declares len near 2^63, driving an
> effectively unbounded loop that performs a GFP_KERNEL xarray insert
> on
> each iteration. This spins a kernel thread in the reply dispatch path
> and
> exhausts memory; on a client that has negotiated
> CEPHFS_FEATURE_DELEG_INO
> (enabled by default) a single reply is enough to wedge the mount.
>
> A legitimate MDS delegates only a small range per set (the userspace
> MDS
> prealloc window, mds_client_prealloc_inos, defaults to 1000). Reject
> any
> set whose len exceeds CEPH_MAX_DELEG_INOS (1M, far above any
> legitimate
> value) by treating the reply as malformed and returning -EIO,
> consistent
> with the existing decode error handling. Normal delegations are well
> under
> the cap and are unaffected.
>
> Fixes: d4846487870897 ("ceph: decode interval_sets for delegated
> inos")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Michael Bommarito <michael.bommarito@xxxxxxxxx>
> Assisted-by: Claude:claude-opus-4-8
> ---
> fs/ceph/mds_client.c | 14 ++++++++++++++
> fs/ceph/super.h | 9 +++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 4f36ac73305dc..0a084c4f3aae2 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -633,6 +633,20 @@ static int ceph_parse_deleg_inos(void **p, void
> *end,
> start, len);
> continue;
> }
> +
> + /*
> + * A legitimate MDS delegates a small range per set.
> Treat a
> + * count larger than any plausible delegation window
> as a
> + * malformed reply rather than spinning while(len--)
> and
> + * inserting unbounded xarray entries.
> + */
> + if (len > CEPH_MAX_DELEG_INOS) {
> + pr_warn_ratelimited_client(cl,
> + "rejecting oversized inode range
> delegation (start=0x%llx len=0x%llx)\n",
> + start, len);
> + return -EIO;
> + }
> +
> while (len--) {
> int err = xa_insert(&s->s_delegated_inos,
> start++,
> DELEGATED_INO_AVAILABLE,
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index afc89ce91804e..43a9b075f344c 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -634,6 +634,15 @@ static inline int ceph_ino_compare(struct inode
> *inode, void *data)
> #define CEPH_MDS_INO_LOG_OFFSET (2 * CEPH_MAX_MDS)
> #define CEPH_INO_SYSTEM_BASE ((6*CEPH_MAX_MDS) +
> (CEPH_MAX_MDS * CEPH_NUM_STRAY))
>
> +/*
> + * Upper bound on the number of inodes the MDS may delegate to a
> client in a
> + * single interval set. The userspace MDS hands out at most a few
> thousand
> + * (mds_client_prealloc_inos, default 1000); 1M is far above any
> legitimate
> + * value and guards ceph_parse_deleg_inos() against an unbounded
> loop driven
> + * by an attacker controlled 64-bit length.
> + */
> +#define CEPH_MAX_DELEG_INOS (1024 * 1024)
Technically speaking, I am not completely convinced by this limit.
Also, this limit is for one single interval. But if attacker tries
multiple times, then we can still have memory pressure or run out of
memory. I think we need to consider more robust solution. Could we
improve the fix?
Thanks,
Slava.
> +
> static inline bool ceph_vino_is_reserved(const struct ceph_vino
> vino)
> {
> if (vino.ino >= CEPH_INO_SYSTEM_BASE ||