Re: [PATCH] vfs: check dentry is still valid in get_link()
From: Ian Kent
Date: Mon Jan 17 2022 - 21:53:03 EST
On Tue, 2022-01-18 at 01:32 +0000, Al Viro wrote:
> On Mon, Jan 17, 2022 at 07:48:49PM +0000, Al Viro wrote:
> > > But that critically depends upon the contents not getting
> > > mangled. If it
> > > *can* be screwed by such unlink, we risk successful lookup
> > > leading to the
> > > wrong place, with nothing to tell us that it's happening. We
> > > could handle
> > > that by adding a check to fs/namei.c:put_link(), and propagating
> > > the error
> > > to callers. It's not impossible, but it won't be pretty.
> > >
> > > And that assumes we avoid oopsen on string changing under us in
> > > the first
> > > place. Which might or might not be true - I hadn't finished the
> > > audit yet.
> > > Note that it's *NOT* just fs/namei.c + fs/dcache.c + some fs
> > > methods -
> > > we need to make sure that e.g. everything called by ->d_hash()
> > > instances
> > > is OK with strings changing right under them. Including
> > > utf8_to_utf32(),
> > > crc32_le(), utf8_casefold_hash(), etc.
> >
> > And AFAICS, ext4, xfs and possibly ubifs (I'm unfamiliar with that
> > one and
> > the call chains there are deep enough for me to miss something)
> > have the
> > "bugger the contents of string returned by RCU ->get_link() if
> > unlink()
> > happens" problem.
> >
> > I would very much prefer to have them deal with that crap,
> > especially
> > since I don't see why does ext4_evict_inode() need to do that
> > memset() -
> > can't we simply check ->i_op in ext4_can_truncate() and be done
> > with
> > that?
>
> This reuse-without-delay has another fun side, AFAICS. Suppose the
> new use
> for inode comes with the same ->i_op (i.e. it's a symlink again) and
> it
> happens right after ->get_link() has returned the pointer to body.
>
> We are already past whatever checks we might add in pick_link(). And
> the
> pointer is still valid. So we end up quietly traversing the body of
> completely unrelated symlink that never had been anywhere near any
> directory
> we might be looking at. With no indication of anything going wrong -
> just
> a successful resolution with bogus result.
Wouldn't that case be caught by the unlazy call since ->get_link()
needs to return -ECHILD for the rcu case now (in xfs anyway)?
>
> Could XFS folks explain what exactly goes wrong if we make actual
> marking
> inode as ready for reuse RCU-delayed, by shifting just that into
> ->free_inode()? Why would we need any extra synchronize_rcu()
> anywhere?