Re: KASAN: use-after-free Read in path_lookupat

From: Linus Torvalds
Date: Sun Mar 24 2019 - 21:23:49 EST


Hmm.

Al, this one seems real and also seems pretty nasty from a vfs
interface standpoint.

On Wed, Nov 28, 2018 at 9:40 AM syzbot
<syzbot+7a8ba368b47fdefca61e@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline]
> BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
> ...
> lookup_last fs/namei.c:2269 [inline]
> path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318

Ok, path lookup using RCU.

> Allocated by task 9424:
> kstrdup+0x39/0x70 mm/util.c:49
> bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356

It's the symlink data for the bpf filesystem. Fair enough.

> Freed by task 9425:
> kfree+0xcf/0x230 mm/slab.c:3817
> bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565
> evict+0x4b9/0x980 fs/inode.c:558
> iput_final fs/inode.c:1550 [inline]
> iput+0x674/0xa90 fs/inode.c:1576
> do_unlinkat+0x733/0xa30 fs/namei.c:4069

.. and the path lookup is racing with the final unlink.

We actually RCU-delay the inode freeing itself, but when we do the
final iput(), the "evict()" function is called synchronously.

Now, the simple fix would seem to just RCU-delay the kfree() of the
symlink data in bpf_evict_inode(). Maybe that's the right thing to do.
I'd call it trivial, except you'd need to have that rcu head to attach
it to, making it slightly less trivial. I guess the kmalloc could just
malloc that too.

But it does strike me that this might be a generic issue. I wonder how
many other filesystems do this?

Alexei, Daniel - the "proper" fix right is probably one of four cases:

(a) rcu-delay the freeing, and use simple_symlink_inode_operations().
No set_delayed_call() needed, because it stays around for the inode
lifetime, but the RCU-delaying is still needed for lookup.

or

(b) put the symlink in a page, and use page_symlink_inode_operations
for that inode, where we have a "struct delayed_call" and get a page
ref

or

(c) put the symlink in the inode itself, and then use that
simple_symlink_inode_operations (and now the RCU-delaying of the inode
protects it).

of

(d) make a special getlink() that does "copy symlink,
set_delayed_call(done, kfree_link, copy)"

but I do wonder if we should perhaps just make
simple_symlink_inode_operations do that (d) case for people.

Al, comments? At the very least, if we don't make
simple_symlink_inode_operations() do that, we should have a *big*
comment that if it's not part of the inode data, it needs to be
RCU-delayed.

Or maybe we could add a final inode callback function for "rcu_free"
that is called as the RCU-delayed freeing of the inode itself happens?
And then people could hook into that for freeing the inode->i_link
data.

So many choices.. But the current situation seems unnecessarily
complex for the filesystem, and isn't really documented.

Our documentation currently says for get_link(): "If the body won't go
away until the inode is gone, nothing else is needed", which is wrong
(or at least very misleading, since the last "inode is gone" callback
we have is that evict() function).

Linus