Re: KASAN: use-after-free Read in sock_release

From: Al Viro
Date: Wed Nov 29 2017 - 21:07:32 EST


On Wed, Nov 29, 2017 at 12:24:55PM -0800, Linus Torvalds wrote:
> Ugh. The inode freeing really is confusing and fairly involved, but
> the last free *should* happen as part of the final dput() that is done
> at the end of __fput().

Note that struct socket is coallocated with its inode. _Normally_
from sock_alloc() (and that's the case here, apparently), but in several
cases it's embedded into another object. TUN and TAP - definitely,
might have been other added. Those should never be passed to sock_release()
at all.

> So in __fput() calls into the
>
> if (file->f_op->release)
> file->f_op->release(inode, file);
>
> then the inode should still be around, because the final ref won't be
> done until later. And RCU simply shouldn't be an issue, because of
> that reference count on the inode.
>
> So it smells like some reference counting went wrong. The socket inode
> creation is a bit confusing, and then in "sock_release()" we do have
> that
>
> if (!sock->file) {
> iput(SOCK_INODE(sock));
> return;
> }
> sock->file = NULL;
>
> which *also* tries to free the inode. I'm not sure what the logic (and
> what the locking) behind that code all is.

If socket has never gone through sock_alloc_file(), sock_release() on it
is called explicitly and frees the sucker. If it has been through
sock_alloc_file(), we must not call sock_release() directly and freeing
is done by iput() from final fput().

> What *is* the locking for "sock->file" anyway?

Pretty much assign-once - zeroing it in the end of sock_release() is
pure cosmetics (we'd damn better have no other references to that
sucker left anywhere; there's still a reference to embedded inode,
but that's it).

FWIW, looking through the callers of sock_alloc_file()... we might be
better off if it did sock_release() on failure. Then the calling
conventions become "sock_alloc_file() means not calling sock_release()
directly - either it'll be done by the final fput() on resulting file,
or by sock_alloc_file() itself".

Look:
1) in lustre:
sock_filp = sock_alloc_file(sock, 0, NULL);
if (IS_ERR(sock_filp)) {
sock_release(sock);
rc = PTR_ERR(sock_filp);
goto out;
}
2) in net/9p:
file = sock_alloc_file(csocket, 0, NULL);
if (IS_ERR(file)) {
pr_err("%s (%d): failed to map fd\n",
__func__, task_pid_nr(current));
sock_release(csocket);
kfree(p);
return PTR_ERR(file);
}
3) in sctp:
*newfile = sock_alloc_file(newsock, 0, NULL);
if (IS_ERR(*newfile)) {
put_unused_fd(retval);
sock_release(newsock);
retval = PTR_ERR(*newfile);
*newfile = NULL;
return retval;
}
4) in accept4():
newfile = sock_alloc_file(newsock, flags, sock->sk->sk_prot_creator->name);
if (IS_ERR(newfile)) {
err = PTR_ERR(newfile);
put_unused_fd(newfd);
sock_release(newsock);
goto out_put;
}

5) called in sock_map_fd(), and the sole caller is
retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));
if (retval < 0)
goto out_release;
...
out_release:
sock_release(sock);
return retval;
(with no fallthrough or other goto into out_release)

6) the second caller in socketpair():
newfile2 = sock_alloc_file(sock2, flags, NULL);
if (IS_ERR(newfile2)) {
err = PTR_ERR(newfile2);
goto out_fput_1;
}
...
out_fput_1:
fput(newfile1);
put_unused_fd(fd2);
put_unused_fd(fd1);
sock_release(sock2);
goto out;
(again, no fallthrough or other goto into out_fput_1)

7) the first caller in socketpair():
newfile1 = sock_alloc_file(sock1, flags, NULL);
if (IS_ERR(newfile1)) {
err = PTR_ERR(newfile1);
goto out_put_unused_both;
}
...
out_put_unused_both:
put_unused_fd(fd2);
out_put_unused_1:
put_unused_fd(fd1);
out_release_both:
sock_release(sock2);
out_release_1:
sock_release(sock1);
out:
return err;
No fallthrough or goto either. Sure, we get a failure exit unshared,
but AFAICS some reordering can simplify things quite a bit there.

8) kcm_clone(). Fucked in head - we allocate socket, then file, *THEN*
sock, then attach sock to socket (already attached to file), then finally
deign to initialize sock (already attached to socket, which is attached
to file). And, surprise, surprise, failure exits are all wrong.
Moreover, calling conventions are broken by design - after we'd put
the damn file into descriptor table we return the pointer to sock
to the caller. By that time it might have bloody well been destroyed
by close(2) from another thread; good thing the caller doesn't use
the damn thing. Unfortunately, it is doing this:
if (!err) {
if (copy_to_user((void __user *)arg, &info,
sizeof(info))) {
err = -EFAULT;
sys_close(info.fd);
}
}
which is also bogus - again, the descriptor might have been already
closed by another thread, with another one put in its place, etc.
The right way to do that is to do fd_install() _last_. After the
last failure exit, i.e. after copy_to_user() in that case.

KCM would've looked as a likely cause of that shit, if not for the
fact that object had been created by socket(2). It definitely needs
fixing, but that's not the cause of PITA here.

Incidentally, grepping for sys_close() shows another piece of fun in
net/netfilter/xt_bpf.c. Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S
IT; THERE'S NO REMOVING IT ON FAILURE EXITS. sys_close() should
never, ever be used that way. Sigh...