Re: [RFC PATCH] KVM: race-free exit from KVM_RUN without POSIX signals

From: Radim KrÄmÃÅ
Date: Wed Feb 08 2017 - 08:26:51 EST


2017-02-08 12:10+0100, Paolo Bonzini:
> The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
> a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
> to a dummy signal handler; by blocking the signal outside KVM_RUN and
> unblocking it inside, this possible race is closed:
>
> VCPU thread service thread
> --------------------------------------------------------------
> check flag
> set flag
> raise signal
> (signal handler does nothing)
> KVM_RUN
>
> However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
> tsk->sighand->siglock on every KVM_RUN. This lock is often on a
> remote NUMA node, because it is on the node of a thread's creator.
> Taking this lock can be very expensive if there are many userspace
> exits (as is the case for SMP Windows VMs without Hyper-V reference
> time counter).
>
> As an alternative, we can put the flag directly in kvm_run so that
> KVM can see it:
>
> VCPU thread service thread
> --------------------------------------------------------------
> raise signal
> signal handler
> set run->immediate_exit
> KVM_RUN
> check run->immediate_exit
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> @@ -2564,9 +2565,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
> synchronize_rcu();
> put_pid(oldpid);
> }
> - r = kvm_arch_vcpu_ioctl_run(vcpu, vcpu->run);
> - trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> + run = vcpu->run;
> + if (run->immediate_exit) {
> + WRITE_ONCE(run->immediate_exit, 0);
> + return -EINTR;
> + }

QEMU also uses self-kick to complete IO, but run->immediate_exit is
checked too soon for that. I think we should move it at least into
kvm_arch_vcpu_ioctl_run(), to cover two uses of the interrupt mask.

(I don't remember the reason behind QEMU's mask on SIGBUS any more.)

Thanks.

> + r = kvm_arch_vcpu_ioctl_run(vcpu, run);
> + trace_kvm_userspace_exit(run->exit_reason, r);
> break;
> + }
> case KVM_GET_REGS: {
> struct kvm_regs *kvm_regs;
>