Re: [PATCH] vfs: check dentry is still valid in get_link()
From: Ian Kent
Date: Sun Jan 16 2022 - 21:55:45 EST
On Sat, 2022-01-15 at 06:38 +0000, Al Viro wrote:
> On Mon, Jan 10, 2022 at 05:11:31PM +0800, Ian Kent wrote:
> > When following a trailing symlink in rcu-walk mode it's possible
> > for
> > the dentry to become invalid between the last dentry seq lock check
> > and getting the link (eg. an unlink) leading to a backtrace similar
> > to this:
> >
> > crash> bt
> > PID: 10964 TASK: ffff951c8aa92f80 CPU: 3 COMMAND: "TaniumCX"
> > …
> > #7 [ffffae44d0a6fbe0] page_fault at ffffffff8d6010fe
> > [exception RIP: unknown or invalid address]
> > RIP: 0000000000000000 RSP: ffffae44d0a6fc90 RFLAGS: 00010246
> > RAX: ffffffff8da3cc80 RBX: ffffae44d0a6fd30 RCX:
> > 0000000000000000
> > RDX: ffffae44d0a6fd98 RSI: ffff951aa9af3008 RDI:
> > 0000000000000000
> > RBP: 0000000000000000 R8: ffffae44d0a6fb94 R9:
> > 0000000000000000
> > R10: ffff951c95d8c318 R11: 0000000000080000 R12:
> > ffffae44d0a6fd98
> > R13: ffff951aa9af3008 R14: ffff951c8c9eb840 R15:
> > 0000000000000000
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > #8 [ffffae44d0a6fc90] trailing_symlink at ffffffff8cf24e61
> > #9 [ffffae44d0a6fcc8] path_lookupat at ffffffff8cf261d1
> > #10 [ffffae44d0a6fd28] filename_lookup at ffffffff8cf2a700
> > #11 [ffffae44d0a6fe40] vfs_statx at ffffffff8cf1dbc4
> > #12 [ffffae44d0a6fe98] __do_sys_newstat at ffffffff8cf1e1f9
> > #13 [ffffae44d0a6ff38] do_syscall_64 at ffffffff8cc0420b
> >
> > Most of the time this is not a problem because the inode is
> > unchanged
> > while the rcu read lock is held.
> >
> > But xfs can re-use inodes which can result in the inode -
> > >get_link()
> > method becoming invalid (or NULL).
>
> Without an RCU delay? Then we have much worse problems...
Sorry for the delay.
That was a problem that was discussed at length with the original post
of this patch that included a patch for this too (misguided though it
was).
That discussion resulted in Darrick merging the problem xfs inline
symlink handling with the xfs normal symlink handling.
Another problem with these inline syslinks was they would hand a
pointer to internal xfs storage to the VFS. Darrick's change
allocates and copies the link then hands it to the VFS to free
after use. And since there's an allocation in the symlink handler
the rcu-walk case returns -ECHILD (on passed NULL dentry) so the
VFS will call unlazy before that next call which I think is itself
enough to resolve this problem.
The only thing I think might be questionable is the VFS copy of the
inode pointer but I think the inode is rcu freed so it will be
around and the seq count will have changed so I think it should be
ok.
If I'm missing something please say so, ;)
Darrick's patch is (was last I looked) in his xfs-next tree.
Ian