Re: [RFC/BUG] Use a bounce buffer for mds client decryption

From: Sam Edwards

Date: Wed Apr 15 2026 - 15:51:38 EST

On Wed, Apr 15, 2026 at 12:14 PM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> On Tue, 2026-04-14 at 20:40 -0700, Sam Edwards wrote:
> > Hi Ceph list,
> >
> > This is a combination RFC and bug report. I'm holding off cleaning up the patch
> > until I get confirmation that this is the right approach. There are some edge
> > cases stemming from complexities in ceph_fname_to_usr(), making this patch into
> > more of a shotgun bugfix than I'd like.
> >

Good day Slava,

> > I have thoroughly tested this patch:
> >
>
> Even really good testing cannot guarantee from potential bugs. :) Have you run
> xfststes/fstests regression testing suite for the patch? Without this run you
> cannot say that you tested your patch good enough. :)

Not the kind of testing I meant: the next paragraph describes thorough
*soak* testing. Since this is only an RFC (lacking my Signed-off-by
and the usual QA that it implies), I'm clarifying that I have at least
made the bug go away. I do not promise an absence of regressions.

> > In my live (AArch64) environment, certain
> > workloads would trigger this particular oops->panic 3-5 times per day. Since I
> > applied this patch ~2 weeks ago, I get no further panics. I do not think my MDS
> > contains any base64-only dentries, however.
> >
> > ---
> > fs/ceph/mds_client.c | 30 ++++++++++++++++++++++++++++--
> > 1 file changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index b1746273f186..a583032c0041 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -414,6 +414,9 @@ static int parse_reply_info_readdir(void **p, void *end,
> > {
> > struct ceph_mds_reply_info_parsed *info = &req->r_reply_info;
> > struct ceph_client *cl = req->r_mdsc->fsc->client;
> > + struct fscrypt_str bname = FSTR_INIT(NULL, 0);
>
> Do we really need to introduce another bname variable? Could we use oname for
> the buffer?

NAK: ceph_fname_to_usr() redirects oname.name under certain conditions
(e.g. no key available), so we risk leaking the bounce buffer if we do
that.

>
> > + struct inode *inode = d_inode(req->r_dentry);
> > + struct ceph_inode_info *ci = ceph_inode(inode);
> > u32 num, i = 0;
> > int err;
> >
> > @@ -441,10 +444,18 @@ static int parse_reply_info_readdir(void **p, void *end,
> > goto bad;
> > }
> >
> > + /*
> > + * `p` points into the message's `front` buffer, which ceph_msg_new2()
> > + * allocates using kvmalloc(), so the buffer may end up outside of the
> > + * kmalloc() region -- but fscript_str.name must be in that region, so
> > + * use a bounce buffer in that case
> > + */
> > + if (unlikely(is_vmalloc_addr(*p)))
> > + if ((err = ceph_fname_alloc_buffer(inode, &bname)))
> > + goto out_bad;
>
> Kernel style strongly prefers braces around the outer if body. What's about
> this?
>
> if (unlikely(is_vmalloc_addr(*p))) {
> err = ceph_fname_alloc_buffer(inode, &bname);
> if (err)
> goto out_bad;
> }

ACK

>
> > +
> > info->dir_nr = num;
> > while (num) {
> > - struct inode *inode = d_inode(req->r_dentry);
> > - struct ceph_inode_info *ci = ceph_inode(inode);
> > struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > struct fscrypt_str tname = FSTR_INIT(NULL, 0);
> > struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > @@ -514,6 +525,13 @@ static int parse_reply_info_readdir(void **p, void *end,
> > oname.name = altname;
> > oname.len = altname_len;
> > }
> > +
> > + if (bname.name) {
>
> Should we have unlikely(bname.name) here?

ACK

>
> > + BUG_ON(oname.len > bname.len);
>
> The ceph_decode_need confirms there are enough bytes in the wire buffer, but
> there is no NAME_MAX check. A misbehaving or malicious MDS can set altname_len >
> NAME_MAX, so oname.len > bname.len. I believe we need to process likewise error
> in more calm way. What's about this?
>
> if (oname.len > bname.len) {
> err = -EIO;
> goto out_bad;
> }

ACK -- with a WARN_ON_ONCE() as suggested by Alex.

>
> > + memcpy(bname.name, oname.name, oname.len);
> > + fname.ctext = tname.name = oname.name = bname.name;
>
> For my taste, the function is complicated enough already. I believe it requires
> some better refactoring.

Well this is an RFC, I'm seeking feedback on that refactoring! I agree
about the complexity. What way of writing the function would be to
your taste?

> Thanks,
> Slava.

Regards,
Sam