Re: [PATCH 2/2] xfs: make sure link path does not go away at access

From: Ian Kent
Date: Thu Nov 11 2021 - 18:10:34 EST


On Thu, 2021-11-11 at 11:08 -0500, Brian Foster wrote:

Hi Brian,

> On Thu, Nov 11, 2021 at 11:39:30AM +0800, Ian Kent wrote:
> > When following a trailing symlink in rcu-walk mode it's possible to
> > succeed in getting the ->get_link() method pointer but the link
> > path
> > string be deallocated while it's being used.
> >
> > Utilize the rcu mechanism to mitigate this risk.
> >
> > Suggested-by: Miklos Szeredi <miklos@xxxxxxxxxx>
> > Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
> > ---
> >  fs/xfs/kmem.h      |    4 ++++
> >  fs/xfs/xfs_inode.c |    4 ++--
> >  fs/xfs/xfs_iops.c  |   10 ++++++++--
> >  3 files changed, 14 insertions(+), 4 deletions(-)
> >
> ...
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index a607d6aca5c4..2977e19da7b7 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -524,11 +524,17 @@ xfs_vn_get_link_inline(
> >  
> >         /*
> >          * The VFS crashes on a NULL pointer, so return -
> > EFSCORRUPTED if
> > -        * if_data is junk.
> > +        * if_data is junk. Also, if the path walk is in rcu-walk
> > mode
> > +        * and the inode link path has gone away due inode re-use
> > we have
> > +        * no choice but to tell the VFS to redo the lookup.
> >          */
> > -       link = ip->i_df.if_u1.if_data;
> > +       link = rcu_dereference(ip->i_df.if_u1.if_data);
> > +       if (!dentry && !link)
> > +               return ERR_PTR(-ECHILD);
> > +
>
> One thing that concerns me slightly about this approach is that inode
> reuse does not necessarily guarantee that if_data is NULL. It seems
> technically just as possible (even if exceedingly unlikely) for link
> to
> point at newly allocated memory since the previous sequence count
> validation check. The inode could be reused as another inline symlink
> for example, though it's not clear to me if that is really a problem
> for
> the vfs (assuming a restart would just land on the new link
> anyways?).
> But the inode could also be reallocated as something like a shortform
> directory, which means passing directory header data or whatever that
> it
> stores in if_data back to pick_link(), which is then further
> processed
> as a string.

This is the sort of feedback I was hoping for.

This sounds related to the life-cycle of xfs inodes and re-use.
Hopefully someone here on the list can enlighten me on this.

The thing that comes to mind is that the inode re-use would
need to occur between the VFS check that validates the inode
is still ok and the use of link string. I think that can still
go away even with the above check.

Hopefully someone can clarify what happens here.

>
> With that, I wonder why we wouldn't just return -ECHILD here like we
> do
> for the non-inline case to address the immediate problem, and then
> perhaps separately consider if we can rework bits of the
> reuse/reclaim
> code to allow rcu lookup of inline symlinks under certain conditions.

Always switching to ref-walk mode would certainly resolve the
problem too, yes, perhaps we have no choice ...

Ian
>
> FWIW, I'm also a little curious why we don't set i_link for inline
> symlinks. I don't think that addresses this validation problem, but
> perhaps might allow rcu lookups in the inline symlink common case
> where
> things don't change during the lookup (and maybe even eliminate the
> need
> for this custom inline callback)..?
>
> Brian
>
> >         if (XFS_IS_CORRUPT(ip->i_mount, !link))
> >                 return ERR_PTR(-EFSCORRUPTED);
> > +
> >         return link;
> >  }
> >  
> >
> >
>