Re: [PATCH] KVM: release anon file in failure path of vm creation

From: Liu Shuo
Date: Fri Jul 15 2016 - 03:04:05 EST

On Fri 15.Jul'16 at 6:09:59 +0100, Al Viro wrote:
On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote:

If there is no such thread (who operates the descriptor based on
guessing), i can think the changing is safe at the point. As the fd has
not been delivered to userspace. Am i right?

Expecting nice behaviour from userland code is something best avoided, really.
Got it! :)

All jokes aside, this other thread doesn't have to be malicious - just being
buggy would suffice. Besides, you never know if something like userns won't
be dumped into the kernel, making your ioctl accessible to genuinely
malicious code.

The only sane approach is to treat descriptor tables as shared data structures
and postpone the insertion of struct file reference into descriptor table
until you are past all failure exits. Including the ones related to copying
to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated
with it, reserves two descriptors, copies them into userland array and only if
everything has succeeded proceeds to fd_install(). In your case passing the
descriptor to userland is not an issue (return value of ioctl(2) goes via
register and that can't fail), so the last failure exit is that after failed
attempt to create debugfs stuff. We have to reserve the descriptor before
that (it's used as a part of debugfs directory name), so anon_inode_getfd()
is not an option - it combines reserving descriptor with fd_install().
Such situations are exactly the reason why anon_inode_getfile() is there;
anon_inode_getfd() is usable only when it is the very last thing we do
before returning the descriptor to userland.
Really appreciate your exhaustive explanations. Thanks.

FWIW, original code was not unreasonable - it simply treated debugfs stuff as
optional and ignored those failures. That way anon_inode_getfd() is fine -
there's no failure exits after it. If we want to fail when debugfs had
been enabled and we'd failed to populate it, we need to use the real primitives
behind anon_inode_getfd(), though.
So, my firstly immature idea returns back.
Is the fd used by debugfs name very useful? Do we really need it here?
Alternative approach is putting the fd into the debugfs, then the code
will be clean here, as we can put anon_inode_getfd at very last of the