Re: [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers

From: Viacheslav Dubeyko

Date: Fri May 29 2026 - 16:50:34 EST


On Tue, 2026-05-26 at 19:58 -0700, Sam Edwards wrote:
> The fscrypt subsystem uses the scatterlist crypto API, inheriting its
> requirement that any buffers are in the linear mapping region. However,
> the messenger client uses kvmalloc() to create buffers for messages,
> which will occasionally place those buffers in the vmalloc() region when
> physical memory fragmentation doesn't permit a large enough kmalloc().
> The various callers of ceph_fname_to_usr() directly pass (slices of) raw
> messages from the MDS without considering that the messages may be in
> vmalloc() buffers, resulting in oopses especially on non-x86 platforms
> (see 'Closes:' for more details and a reproducer).
>
> Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
> fname->ctext, fname->name, and/or oname->name buffers, using `tname`
> (which, when non-null, must be a linear address; when null, is briefly
> allocated as necessary) as a bounce buffer to avoid passing any
> inappropriate addresses to fscrypt_fname_disk_to_usr().
>
> Additionally change parse_reply_info_readdir() -- the only function to
> supply its own `tname` -- to follow the new "tname must never come from
> vmalloc()" rule by passing NULL when the message is not in the linear
> region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
> exists only when processing the minority of messages that spill into
> vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
> messages. Still, if the overhead proves unreasonable in the future, it
> is easy enough to mitigate: a future change could allocate a bounce
> buffer in parse_reply_info_readdir() and use that as `tname` instead.
>
> Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ceph-2Ddevel_CAH5Ym4ga7miUQE0K-2DcJA93Ya7w62P69MAN27R5cBiYnudoOHdA-40mail.gmail.com_T_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=bcX0FhBD6jzWXGOsw2LoJOl_TqgobmwNBmqNIhj2K0qBn2krB8IUrIhcUs8LmJWM&s=2uInY5Ys7xQ_57Ifo4uovP7_e8SN0Q_wnzBDX-uj0hE&e=
> Cc: stable@xxxxxxxxxxxxxxx # v6.6+
> Signed-off-by: Sam Edwards <CFSworks@xxxxxxxxx>
> ---
> fs/ceph/crypto.c | 37 +++++++++++++++++++++++++++++--------
> fs/ceph/mds_client.c | 8 ++++++--
> 2 files changed, 35 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 7515cb251226..61d6830d16bc 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> * Otherwise, base64 decode the string, and then ask fscrypt to format it
> * for userland presentation.
> *
> + * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
> + * linear-mapped region, this function slightly relaxes those requirements:
> + * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
> + *
> * Returns 0 on success or negative error code on error.
> */
> int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> @@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> {
> struct inode *dir = fname->dir;
> struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
> + struct fscrypt_str _oname;
> struct fscrypt_str iname;
> char *name = fname->name;
> int name_len = fname->name_len;
> int ret;
>
> + if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
> + return -EIO;
> +
> /* Sanity check that the resulting name will fit in the buffer */
> if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
> return -EIO;
> @@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> goto out_inode;
> }
>
> + if (!tname && (fname->ctext_len == 0 ||
> + unlikely(is_vmalloc_addr(fname->ctext)) ||
> + unlikely(is_vmalloc_addr(oname->name)))) {
> + ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> + if (ret)
> + goto out_inode;
> + tname = _tname.name;
> + }
> +
> if (fname->ctext_len == 0) {
> int declen;
>
> - if (!tname) {
> - ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> - if (ret)
> - goto out_inode;
> - tname = _tname.name;
> - }
> -
> declen = base64_decode(name, name_len, tname, false,
> BASE64_IMAP);
> if (declen <= 0) {
> @@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> }
> iname.name = tname;
> iname.len = declen;
> + } else if (unlikely(is_vmalloc_addr(fname->ctext))) {
> + memcpy(tname, fname->ctext, fname->ctext_len);
> +
> + iname.name = tname;
> + iname.len = fname->ctext_len;
> } else {
> iname.name = fname->ctext;
> iname.len = fname->ctext_len;
> }
>
> - ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
> + _oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
> + tname : oname->name;
> + _oname.len = oname->len;
> + ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
> + oname->len = _oname.len;
> if (!ret && (dir != fname->dir)) {
> char tmp_buf[BASE64_CHARS(NAME_MAX)];
>
> @@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> oname->len, oname->name, dir->i_ino);
> memcpy(oname->name, tmp_buf, name_len);
> oname->len = name_len;
> + } else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
> + memcpy(oname->name, _oname.name, _oname.len);
> }

When both dir != fname->dir (longname snapshot) and is_vmalloc_addr(oname->name)
are true:

(1) The if branch is taken — NOT the else if.
(2) _oname.name = tname holds the decrypted result (fscrypt wrote there).
(3) oname->name is the stale vmalloc buffer — the copy-back in the else if was
never executed.
(4) The snprintf reads oname->name and formats a snapshot name from garbage.

Am I right?

This part of logic needs to be reworked carefully. This if - else construction
becomes really complicated to understand.

Thanks,
Slava.

>
> out:
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index aa6730b48e97..8fcf185e3a82 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
> * to do the base64_decode in-place. It's
> * safe because the decoded string should
> * always be shorter, which is 3/4 of origin
> - * string.
> + * string. If this message was allocated with
> + * vmalloc() (happens, but rarely), leave it
> + * NULL and let ceph_fname_to_usr() allocate
> + * suitable temporary working space instead.
> */
> - tname = _name;
> + if (likely(!is_vmalloc_addr(_name)))
> + tname = _name;
>
> /*
> * Set oname to _name too, and this will be