Re: WARNING: refcount bug in kvm_vm_ioctl

From: Jann Horn
Date: Fri Feb 15 2019 - 12:22:48 EST


On Fri, Feb 15, 2019 at 6:13 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Fri, Feb 15, 2019 at 6:10 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Fri, Feb 15, 2019 at 5:45 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > On Fri, Feb 15, 2019 at 5:03 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > On Fri, Feb 15, 2019 at 4:40 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > > > On Thu, Oct 11, 2018 at 4:18 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> > > > > > On 10/10/2018 09:58, syzbot wrote:
> > > > > > > do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
> > > > > > > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:993
> > > > > > > RIP: 0010:refcount_inc_checked+0x5d/0x70 lib/refcount.c:153
> > > > > > > kvm_get_kvm arch/x86/kvm/../../../virt/kvm/kvm_main.c:766 [inline]
> > > > > > > kvm_ioctl_create_device arch/x86/kvm/../../../virt/kvm/kvm_main.c:2924
> > > > > > > kvm_vm_ioctl+0xed7/0x1d40 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3114
> > > > > > > vfs_ioctl fs/ioctl.c:46 [inline]
> > > > > > > file_ioctl fs/ioctl.c:501 [inline]
> > > > > > > do_vfs_ioctl+0x1de/0x1720 fs/ioctl.c:685
> > > > > > > ksys_ioctl+0xa9/0xd0 fs/ioctl.c:702
> > > > > > > __do_sys_ioctl fs/ioctl.c:709 [inline]
> > > > > > > __se_sys_ioctl fs/ioctl.c:707 [inline]
> > > > > > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:707
> > > > > > > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > > > > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > >
> > > > > > The trace here is fairly simple, but I don't understand how this could
> > > > > > happen.
> > > > > >
> > > > > > The kvm_get_kvm is done within kvm_ioctl_create_device, which is called
> > > > > > from ioctl; the last reference cannot disappear inside a ioctl, because:
> > > > > >
> > > > > > 1) kvm_ioctl is called from vfs_ioctl, which does fdget and holds the fd
> > > > > > reference until after kvm_vm_ioctl returns
> > > > > >
> > > > > > 2) the file descriptor holds one reference to the struct kvm*, and this
> > > > > > reference is not released until kvm_vm_release is called by the last
> > > > > > fput (which could be fdput's call to fput if the process has exited in
> > > > > > the meanwhile)
> > > > > >
> > > > > > 3) for completeness, in case anon_inode_getfd fails, put_unused_fd will
> > > > > > not invoke the file descriptor's ->release callback (in this case
> > > > > > kvm_device_release).
> > > > > >
> > > > > > CCing some random people to get their opinion...
> > > > > >
> > > > > > Paolo
> > > > >
> > > > >
> > > > > Jann, is it what you fixed in "kvm: fix kvm_ioctl_create_device()
> > > > > reference counting (CVE-2019-6974)"?
> > > > > If so, we need to close the syzbot bug.
> > > > >
> > > > >
> > > > > > > # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
> > > > > > > #{"threaded":true,"collide":true,"repeat":true,"procs":6,"sandbox":"none","fault_call":-1,"tun":true,"tmpdir":true,"cgroups":true,"netdev":true,"resetnet":true,"segv":true}
> > > > > > > r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000380)='/dev/kvm\x00', 0x0, 0x0)
> > > > > > > r1 = syz_open_dev$dspn(&(0x7f0000000100)='/dev/dsp#\x00', 0x3fe, 0x400)
> > > > > > > r2 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
> > > >
> > > > Here we create a VM fd...
> > > >
> > > > > > > perf_event_open(&(0x7f0000000040)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x50d}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0)
> > > > > > > mincore(&(0x7f0000ffc000/0x1000)=nil, 0x1000, &(0x7f00000003c0)=""/4096)
> > > > > > > setrlimit(0x0, &(0x7f0000000000))
> > > > > > > readahead(r1, 0x3, 0x9a6)
> > > > > > > ioctl$KVM_CREATE_DEVICE(r2, 0xc00caee0, &(0x7f00000002c0)={0x4})
> > > >
> > > > ... and here we do the KVM_CREATE_DEVICE ioctl with type==KVM_DEV_TYPE_VFIO.
> > > >
> > > > So that far it looks exactly like CVE-2019-6974. But CVE-2019-6974
> > > > also requires that someone calls close() on the file descriptor of the
> > > > newly created device very quickly, before the ioctl is able to
> > > > increment the refcount further, and I don't see anything like that
> > > > here. Is there a chance that syzkaller called close() on a file
> > > > descriptor while the ioctl() was still running without saying so here
> > > > (potentially through dup2() or something like that)?
> > >
> > > Yes, all fd's are closed at the end of the test:
> > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L2561-L2568
> >
> > Can that happen before the ioctl() has finished?
>
> Yes, ioctl runs in a separate thread.

Alright, then yes, it looks like the same bug.

Since the root cause wasn't identified from the original syzkaller
report, I wonder whether it would make sense to add infrastructure
that makes it easier to identify the root cause from a syzkaller
report; here are some random ideas:

- A comment at the end of the syzkaller reproducer that lists
interesting syscalls that are performed implicitly, in particular
close(3..30). Without that information, the race isn't really visible
here.
- A config option that allows recording (subsets of) stacks, PIDs,
and cpu numbers in every function that modifies a refcount, and can
dump the last N such records when a refcount detects an error. This
would probably be helpful for figuring out refcounting bugs, but I
don't actually know how many of the bugs syzkaller finds are
refcounting-related - if it's not a lot, this might not be worth the
effort.