Re: [PATCH] KVM: release anon file in failure path of vm creation

From: Al Viro
Date: Fri Jul 15 2016 - 01:10:10 EST


On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote:

> If there is no such thread (who operates the descriptor based on
> guessing), i can think the changing is safe at the point. As the fd has
> not been delivered to userspace. Am i right?

<wry>
Expecting nice behaviour from userland code is something best avoided, really.
</wry>

All jokes aside, this other thread doesn't have to be malicious - just being
buggy would suffice. Besides, you never know if something like userns won't
be dumped into the kernel, making your ioctl accessible to genuinely
malicious code.

The only sane approach is to treat descriptor tables as shared data structures
and postpone the insertion of struct file reference into descriptor table
until you are past all failure exits. Including the ones related to copying
to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated
with it, reserves two descriptors, copies them into userland array and only if
everything has succeeded proceeds to fd_install(). In your case passing the
descriptor to userland is not an issue (return value of ioctl(2) goes via
register and that can't fail), so the last failure exit is that after failed
attempt to create debugfs stuff. We have to reserve the descriptor before
that (it's used as a part of debugfs directory name), so anon_inode_getfd()
is not an option - it combines reserving descriptor with fd_install().
Such situations are exactly the reason why anon_inode_getfile() is there;
anon_inode_getfd() is usable only when it is the very last thing we do
before returning the descriptor to userland.

FWIW, original code was not unreasonable - it simply treated debugfs stuff as
optional and ignored those failures. That way anon_inode_getfd() is fine -
there's no failure exits after it. If we want to fail when debugfs had
been enabled and we'd failed to populate it, we need to use the real primitives
behind anon_inode_getfd(), though.