Re: fs: use-after-free in path_lookupat
From: Dmitry Vyukov
Date: Mon Mar 06 2017 - 04:47:45 EST
On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>
>> Added more debug output.
>>
>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>> 0x1000)
>>
>> actually passes name="" because of the overlapping addresses. Flags
>> contain AT_EMPTY_PATH.
>
> Bloody hell... So you end up with name == (char *)&handle->handle_type + 3?
> Looks like it would be a lot more useful to dump the actual contents of
> those suckers right before the syscall...
We can't yet do dumping, it's opposite of generation and we don't have
enough info for it. Strace can do it. But note that it does not
necessary say you true. First, kernel can overwrite some of inputs
with copy_to_user before reading them. Second, racing syscalls that
use the same memory for inputs will lead to non-deterministic inputs,
what you will see from strace is not necessary what kernel sees.
> Anyway, that explains WTF is going on. The bug is in path_init() and
> it triggers when you pass something with dentry allocated by d_alloc_pseudo()
> as dfd, combined with empty pathname. You need to have the file closed
> by another thread, and have that another thread get out of closing syscall
> (close(), dup2(), etc.) before the caller of path_init() gets to
> complete_walk(). We need to make sure that this sucker gets DCACHE_RCUPDATE
> while it's still guaranteed to be pinned down. Could you try to reproduce
> with the patch below applied?
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 6f7d96368734..70840281a41c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
> nd->path = f.file->f_path;
> if (flags & LOOKUP_RCU) {
> rcu_read_lock();
> - nd->inode = nd->path.dentry->d_inode;
> - nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
> + if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
> + spin_lock(&dentry->d_lock);
> + dentry->d_flags |= DCACHE_RCUACCESS;
> + spin_unlock(&dentry->d_lock);
> + }
> + nd->inode = dentry->d_inode;
> + nd->seq = read_seqcount_begin(&dentry->d_seq);
> } else {
> path_get(&nd->path);
> - nd->inode = nd->path.dentry->d_inode;
> + nd->inode = dentry->d_inode;
> }
> fdput(f);
> return s;
This seems to fix the crash. Reproducer has survived an hour while
usually it crashes within 5 minutes or so.
But we will back to you with data race reports later. All unprotected
accesses should use READ_ONCE/WRITE_ONCE.