Re: [PATCH 2/2] ceph: properly decrypt filenames in vmalloc() buffers
From: Sam Edwards
Date: Fri May 29 2026 - 21:01:29 EST
On Fri, May 29, 2026 at 1:50 PM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> On Tue, 2026-05-26 at 19:58 -0700, Sam Edwards wrote:
> > The fscrypt subsystem uses the scatterlist crypto API, inheriting its
> > requirement that any buffers are in the linear mapping region. However,
> > the messenger client uses kvmalloc() to create buffers for messages,
> > which will occasionally place those buffers in the vmalloc() region when
> > physical memory fragmentation doesn't permit a large enough kmalloc().
> > The various callers of ceph_fname_to_usr() directly pass (slices of) raw
> > messages from the MDS without considering that the messages may be in
> > vmalloc() buffers, resulting in oopses especially on non-x86 platforms
> > (see 'Closes:' for more details and a reproducer).
> >
> > Make ceph_fname_to_usr() explicitly tolerant of vmalloc()-allocated
> > fname->ctext, fname->name, and/or oname->name buffers, using `tname`
> > (which, when non-null, must be a linear address; when null, is briefly
> > allocated as necessary) as a bounce buffer to avoid passing any
> > inappropriate addresses to fscrypt_fname_disk_to_usr().
> >
> > Additionally change parse_reply_info_readdir() -- the only function to
> > supply its own `tname` -- to follow the new "tname must never come from
> > vmalloc()" rule by passing NULL when the message is not in the linear
> > region. Though this causes a per-dentry kmalloc()+kfree(), this overhead
> > exists only when processing the minority of messages that spill into
> > vmalloc(). My (crude) testing puts this at only about 1 in 8,000 readdir
> > messages. Still, if the overhead proves unreasonable in the future, it
> > is easy enough to mitigate: a future change could allocate a bounce
> > buffer in parse_reply_info_readdir() and use that as `tname` instead.
> >
> > Fixes: 457117f077c67 ("ceph: add helpers for converting names for userland presentation")
> > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_ceph-2Ddevel_CAH5Ym4ga7miUQE0K-2DcJA93Ya7w62P69MAN27R5cBiYnudoOHdA-40mail.gmail.com_T_&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=bcX0FhBD6jzWXGOsw2LoJOl_TqgobmwNBmqNIhj2K0qBn2krB8IUrIhcUs8LmJWM&s=2uInY5Ys7xQ_57Ifo4uovP7_e8SN0Q_wnzBDX-uj0hE&e=
> > Cc: stable@xxxxxxxxxxxxxxx # v6.6+
> > Signed-off-by: Sam Edwards <CFSworks@xxxxxxxxx>
> > ---
> > fs/ceph/crypto.c | 37 +++++++++++++++++++++++++++++--------
> > fs/ceph/mds_client.c | 8 ++++++--
> > 2 files changed, 35 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index 7515cb251226..61d6830d16bc 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -298,6 +298,10 @@ int ceph_encode_encrypted_dname(struct inode *parent, char *buf, int elen)
> > * Otherwise, base64 decode the string, and then ask fscrypt to format it
> > * for userland presentation.
> > *
> > + * Though the fscrypt/crypto subsystems broadly expect all buffers to be in the
> > + * linear-mapped region, this function slightly relaxes those requirements:
> > + * fname->ctext, fname->name, and oname->name may be vmalloc(), but not tname.
> > + *
> > * Returns 0 on success or negative error code on error.
> > */
> > int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > @@ -305,11 +309,15 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > {
> > struct inode *dir = fname->dir;
> > struct fscrypt_str _tname = FSTR_INIT(NULL, 0);
> > + struct fscrypt_str _oname;
> > struct fscrypt_str iname;
> > char *name = fname->name;
> > int name_len = fname->name_len;
> > int ret;
> >
> > + if (WARN_ON_ONCE(tname && is_vmalloc_addr(tname)))
> > + return -EIO;
> > +
> > /* Sanity check that the resulting name will fit in the buffer */
> > if (fname->name_len > NAME_MAX || fname->ctext_len > NAME_MAX)
> > return -EIO;
> > @@ -350,16 +358,18 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > goto out_inode;
> > }
> >
> > + if (!tname && (fname->ctext_len == 0 ||
> > + unlikely(is_vmalloc_addr(fname->ctext)) ||
> > + unlikely(is_vmalloc_addr(oname->name)))) {
> > + ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > + if (ret)
> > + goto out_inode;
> > + tname = _tname.name;
> > + }
> > +
> > if (fname->ctext_len == 0) {
> > int declen;
> >
> > - if (!tname) {
> > - ret = fscrypt_fname_alloc_buffer(NAME_MAX, &_tname);
> > - if (ret)
> > - goto out_inode;
> > - tname = _tname.name;
> > - }
> > -
> > declen = base64_decode(name, name_len, tname, false,
> > BASE64_IMAP);
> > if (declen <= 0) {
> > @@ -368,12 +378,21 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > }
> > iname.name = tname;
> > iname.len = declen;
> > + } else if (unlikely(is_vmalloc_addr(fname->ctext))) {
> > + memcpy(tname, fname->ctext, fname->ctext_len);
> > +
> > + iname.name = tname;
> > + iname.len = fname->ctext_len;
> > } else {
> > iname.name = fname->ctext;
> > iname.len = fname->ctext_len;
> > }
> >
> > - ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, oname);
> > + _oname.name = unlikely(is_vmalloc_addr(oname->name)) ?
> > + tname : oname->name;
> > + _oname.len = oname->len;
> > + ret = fscrypt_fname_disk_to_usr(dir, 0, 0, &iname, &_oname);
> > + oname->len = _oname.len;
> > if (!ret && (dir != fname->dir)) {
> > char tmp_buf[BASE64_CHARS(NAME_MAX)];
> >
> > @@ -381,6 +400,8 @@ int ceph_fname_to_usr(const struct ceph_fname *fname, unsigned char *tname,
> > oname->len, oname->name, dir->i_ino);
> > memcpy(oname->name, tmp_buf, name_len);
> > oname->len = name_len;
> > + } else if (!ret && unlikely(is_vmalloc_addr(oname->name))) {
> > + memcpy(oname->name, _oname.name, _oname.len);
> > }
>
> When both dir != fname->dir (longname snapshot) and is_vmalloc_addr(oname->name)
> are true:
>
> (1) The if branch is taken — NOT the else if.
> (2) _oname.name = tname holds the decrypted result (fscrypt wrote there).
> (3) oname->name is the stale vmalloc buffer — the copy-back in the else if was
> never executed.
> (4) The snprintf reads oname->name and formats a snapshot name from garbage.
>
> Am I right?
Hi Slava,
Indeed you are. Well spotted, I should have examined that snprintf a
little more closely.
> This part of logic needs to be reworked carefully. This if - else construction
> becomes really complicated to understand.
ACK - I was originally just going to amend the patch with that fixed:
char tmp_buf[BASE64_CHARS(NAME_MAX)];
name_len = snprintf(tmp_buf, sizeof(tmp_buf), "_%.*s_%llu",
- oname->len, oname->name, dir->i_ino);
+ _oname.len, _oname.name, dir->i_ino);
memcpy(oname->name, tmp_buf, name_len);
oname->len = name_len;
...but, looking at the if/else-if block more carefully, I realized we
only care about ensuring `!ret`, so I'll simplify the control flow for
v2.
Thank you for the feedback,
Sam
> Thanks,
> Slava.
>
> >
> > out:
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index aa6730b48e97..8fcf185e3a82 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -538,9 +538,13 @@ static int parse_reply_info_readdir(void **p, void *end,
> > * to do the base64_decode in-place. It's
> > * safe because the decoded string should
> > * always be shorter, which is 3/4 of origin
> > - * string.
> > + * string. If this message was allocated with
> > + * vmalloc() (happens, but rarely), leave it
> > + * NULL and let ceph_fname_to_usr() allocate
> > + * suitable temporary working space instead.
> > */
> > - tname = _name;
> > + if (likely(!is_vmalloc_addr(_name)))
> > + tname = _name;
> >
> > /*
> > * Set oname to _name too, and this will be