Re: KASAN: use-after-free Read in path_lookupat
From: Linus Torvalds
Date: Mon Mar 25 2019 - 17:45:24 EST
On Mon, Mar 25, 2019 at 2:14 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Maybe, but we really need to come up with sane documentation on the
> entire drop_inode/evict_inode/destroy_inode/rcu_destroy_inode
> group ;-/
Yeah.
I actually think the "destroy_inode/rcu_destroy_inode" part is the
simplest one to understand: it's just tearing down the inode, and the
RCU part is (obviously, as shown by this thread) somewhat subtle, but
at the same time not really all that hard to explain: "inodes that can
be looked up through RCU lookup need to be destroyed with RCU delay".
I think drop_inode->evict_inode is arguably the much more subtle
stuff. That's the "we're not destroying things, we're making decisions
about the state" kind of thing.
And we actually don't have any documentation at all for those two.
Well, there's the "porting" thing, but..
> And I want to understand the writeback-related issues
> in ocfs2 and f2fs - the current kludges in those smell fishy.
I'm assuming you're talking about exactly that drop_inode() kind of subtlety.
At least for ocfs2, the whole "rcu_destroy_inode()" patch I sent out
would simplify things a lot. *All* that the ocfs2_destroy_inode()
function does is to schedule the inode freeing for RCU delay.
Assuming my patch is fine (as mentioned, it was entirely untested),
that patch would actually simplify that a lot. Get rid of
ocfs2_destroy_inode() entirely, and just make
ocfs2_rcu_destroy_inode() do that kmem_cache_free(). Mucho clean-up
(we have that ocfs2_rcu_destroy_inode() currently as
ocfs2_i_callback(), but with the ugly rcu-head interface).
> > if (unlikely(inode_init_always(sb, inode))) {
> > - if (inode->i_sb->s_op->destroy_inode)
> > + if (inode->i_sb->s_op->destroy_inode) {
> > inode->i_sb->s_op->destroy_inode(inode);
> > - else
> > + if (!inode->i_sb->s_op->rcu_destroy_inode)
> > + return NULL;
> > + }
> > + if (!inode->i_sb->s_op->rcu_destroy_inode ||
> > + !inode->i_sb->s_op->rcu_destroy_inode(inode))
> > kmem_cache_free(inode_cachep, inode);
>
> ITYM i_callback(inode); here, possibly suitably renamed.
Yeah, I guess we could just call that i_callback() directly. I wrote
it that way because it was how the code was organized (it didn't call
i_callback() before either), but yes, it woudl avoid some duplicate
code to do it that way.
And yes, at that point we'd probably want to rename it too.
On the whole, looking at a few filesystems, I really think that
'rcu_destroy_inode()" would simplify several of them. That ocfs2 case
I already mentioned, but looking around the exact same is trye in
v9fs, vxfs, dlmfs, hostfs, bfs, coda, fatfs, isofs, minix, nfs,
openprom, proc, qnx4, qnx6, sysv, befs, adfs, affs, efs, ext2, f2fs,
gfs2, hfs, hfsplus, hpfs, jffs2, nilfs, reiserfs, romfs, squashfs)
And in other filesystems that actually *do* have a real
destroy_inode() that does other things too, they still want the rcu
delayed part as well (eg btrfs, ceph, fuse, hugetlbfs, afs, ecryptfs,
ext4, jfs, organgefs, ovl, ubifs).
In fact, every single filesystem I looked at (and I looked at most)
would be simplified by that patch.
And for some it would make it easier to fix bugs in the process (ie
bpf) because they don't currently have that rcu delay and they need
it.
So looking at all the existing users of destroy_inode(), I really do
think we should have that rcu_destroy_inode() callback. Because that
3 files changed, 28 insertions(+), 6 deletions(-)
will allow quite a lot of lines to be removed from low-level
filesystems with all the call_rcu() and callback container_of(head,
struct inode, i_rcu) noise just going away.
But yes, it would be good to have more documentation fo the drop/evict
phase. As mentioned, I think the destroy phase is actually the easy
one, with the RCU part being the most complex one that this would
actually simplify.
Linus