RE: [RFC/BUG] Use a bounce buffer for mds client decryption

From: Viacheslav Dubeyko

Date: Thu Apr 16 2026 - 14:58:41 EST

On Wed, 2026-04-15 at 12:51 -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 12:14 PM Viacheslav Dubeyko
> <Slava.Dubeyko@xxxxxxx> wrote:
> >
> > On Tue, 2026-04-14 at 20:40 -0700, Sam Edwards wrote:
> > > Hi Ceph list,
> > >
> > > This is a combination RFC and bug report. I'm holding off cleaning up the patch
> > > until I get confirmation that this is the right approach. There are some edge
> > > cases stemming from complexities in ceph_fname_to_usr(), making this patch into
> > > more of a shotgun bugfix than I'd like.
> > >
>
> Good day Slava,
>
> > > I have thoroughly tested this patch:
> > >
> >
> > Even really good testing cannot guarantee from potential bugs. :) Have you run
> > xfststes/fstests regression testing suite for the patch? Without this run you
> > cannot say that you tested your patch good enough. :)
>
> Not the kind of testing I meant: the next paragraph describes thorough
> *soak* testing. Since this is only an RFC (lacking my Signed-off-by
> and the usual QA that it implies), I'm clarifying that I have at least
> made the bug go away. I do not promise an absence of regressions.
>
> > > In my live (AArch64) environment, certain
> > > workloads would trigger this particular oops->panic 3-5 times per day. Since I
> > > applied this patch ~2 weeks ago, I get no further panics. I do not think my MDS
> > > contains any base64-only dentries, however.
> > >
> > > ---
> > > fs/ceph/mds_client.c | 30 ++++++++++++++++++++++++++++--
> > > 1 file changed, 28 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index b1746273f186..a583032c0041 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -414,6 +414,9 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > {
> > > struct ceph_mds_reply_info_parsed *info = &req->r_reply_info;
> > > struct ceph_client *cl = req->r_mdsc->fsc->client;
> > > + struct fscrypt_str bname = FSTR_INIT(NULL, 0);
> >
> > Do we really need to introduce another bname variable? Could we use oname for
> > the buffer?
>
> NAK: ceph_fname_to_usr() redirects oname.name under certain conditions
> (e.g. no key available), so we risk leaking the bounce buffer if we do
> that.
>
> >
> > > + struct inode *inode = d_inode(req->r_dentry);
> > > + struct ceph_inode_info *ci = ceph_inode(inode);
> > > u32 num, i = 0;
> > > int err;
> > >
> > > @@ -441,10 +444,18 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > goto bad;
> > > }
> > >
> > > + /*
> > > + * `p` points into the message's `front` buffer, which ceph_msg_new2()
> > > + * allocates using kvmalloc(), so the buffer may end up outside of the
> > > + * kmalloc() region -- but fscript_str.name must be in that region, so
> > > + * use a bounce buffer in that case
> > > + */
> > > + if (unlikely(is_vmalloc_addr(*p)))
> > > + if ((err = ceph_fname_alloc_buffer(inode, &bname)))
> > > + goto out_bad;
> >
> > Kernel style strongly prefers braces around the outer if body. What's about
> > this?
> >
> > if (unlikely(is_vmalloc_addr(*p))) {
> > err = ceph_fname_alloc_buffer(inode, &bname);
> > if (err)
> > goto out_bad;
> > }
>
> ACK
>
> >
> > > +
> > > info->dir_nr = num;
> > > while (num) {
> > > - struct inode *inode = d_inode(req->r_dentry);
> > > - struct ceph_inode_info *ci = ceph_inode(inode);
> > > struct ceph_mds_reply_dir_entry *rde = info->dir_entries + i;
> > > struct fscrypt_str tname = FSTR_INIT(NULL, 0);
> > > struct fscrypt_str oname = FSTR_INIT(NULL, 0);
> > > @@ -514,6 +525,13 @@ static int parse_reply_info_readdir(void **p, void *end,
> > > oname.name = altname;
> > > oname.len = altname_len;
> > > }
> > > +
> > > + if (bname.name) {
> >
> > Should we have unlikely(bname.name) here?
>
> ACK
>
> >
> > > + BUG_ON(oname.len > bname.len);
> >
> > The ceph_decode_need confirms there are enough bytes in the wire buffer, but
> > there is no NAME_MAX check. A misbehaving or malicious MDS can set altname_len >
> > NAME_MAX, so oname.len > bname.len. I believe we need to process likewise error
> > in more calm way. What's about this?
> >
> > if (oname.len > bname.len) {
> > err = -EIO;
> > goto out_bad;
> > }
>
> ACK -- with a WARN_ON_ONCE() as suggested by Alex.
>
> >
> > > + memcpy(bname.name, oname.name, oname.len);
> > > + fname.ctext = tname.name = oname.name = bname.name;
> >
> > For my taste, the function is complicated enough already. I believe it requires
> > some better refactoring.
>
> Well this is an RFC, I'm seeking feedback on that refactoring! I agree
> about the complexity. What way of writing the function would be to
> your taste?
>
>

We need to manage the complexity of the function. It sounds to me the necessity
of introducing some structure that can include these tname, oname, fname and
related items. Also, some short functions can be introduced to contain the
operations with this structure. This direction can make the
parse_reply_info_readdir() more compact and easy to understand. I think that
suggested memory management logic can be implemented as another structure and
method(s) that can hide the management logic from the parse_reply_info_readdir()
workflow.

Thanks,
Slava.