Re: [PATCH v5 1/8] KVM: Clean up benign vcpu->cpu data races when kicking vCPUs

From: Lai Jiangshan
Date: Fri Sep 03 2021 - 20:52:44 EST


On Fri, Sep 3, 2021 at 3:52 PM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu
> is read exactly once, and by ensuring the vCPU is booted from guest mode
> if kvm_arch_vcpu_should_kick() returns true. Fix a similar race in
> kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if
> kvm_request_needs_ipi() returns true.
>
> Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or
> kvm_request_needs_ipi()) means the target vCPU could get migrated (change
> vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and
> reading vcpu->mode. If that happens, the kick/IPI will be sent to the
> old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs.
>
> Although failing to kick the vCPU is not exactly ideal, practically
> speaking it cannot cause a functional issue unless there is also a bug in
> the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s
> behavior.
>
> The purpose of sending an IPI is purely to get a vCPU into the host (or
> out of reading SPTEs) so that the vCPU can recognize a change in state,
> e.g. a KVM_REQ_* request. If vCPU's handling of the state change is
> required for correctness, KVM must ensure either the vCPU sees the change
> before entering the guest, or that the sender sees the vCPU as running in
> guest mode. All architectures handle this by (a) sending the request
> before calling kvm_vcpu_kick() and (b) checking for requests _after_
> setting vcpu->mode.
>
> x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to
> ensure it kicks and waits for vCPUs that started reading SPTEs _before_
> MMU changes were finalized, but any vCPU that starts reading after MMU
> changes were finalized will see the new state and can continue on
> uninterrupted.
>
> For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g.
> x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied
> upon for functional correctness, e.g. in the dirty log case, userspace
> cannot assume it has a 100% complete log if vCPUs are still running.
>
> All that said, eliminate the benign race since the cost of doing so is an
> "extra" atomic cmpxchg() in the case where the target vCPU is loaded by
> the current pCPU or is not loaded at all. I.e. the kick will be skipped
> due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as
> opposed to the kick being skipped because of the cpu checks.
>
> Keep the "cpu != me" checks even though they appear useless/impossible at
> first glance. x86 processes guest IPI writes in a fast path that runs in
> IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE. And
> calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or
> READING_SHADOW_PAGE_TABLES is perfectly reasonable.
>
> Note, a race with the cpu_online() check in kvm_vcpu_kick() likely
> persists, e.g. the vCPU could exit guest mode and get offlined between
> the cpu_online() check and the sending of smp_send_reschedule(). But,
> the online check appears to exist only to avoid a WARN in x86's
> native_smp_send_reschedule() that fires if the target CPU is not online.
> The reschedule WARN exists because CPU offlining takes the CPU out of the
> scheduling pool, i.e. the WARN is intended to detect the case where the
> kernel attempts to schedule a task on an offline CPU. The actual sending
> of the IPI is a non-issue as at worst it will simpy be dropped on the
> floor. In other words, KVM's usurping of the reschedule IPI could
> theoretically trigger a WARN if the stars align, but there will be no
> loss of functionality.
>
> [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a
>
> Cc: Venkatesh Srinivas <venkateshs@xxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Fixes: 97222cc83163 ("KVM: Emulate local APIC in kernel")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

Reviewed-by: Lai Jiangshan <jiangshanlai@xxxxxxxxx>

> ---
> virt/kvm/kvm_main.c | 36 ++++++++++++++++++++++++++++--------
> 1 file changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3e67c93ca403..786b914db98f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -273,14 +273,26 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
> continue;
>
> kvm_make_request(req, vcpu);
> - cpu = vcpu->cpu;
>
> if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
> continue;
>
> - if (tmp != NULL && cpu != -1 && cpu != me &&
> - kvm_request_needs_ipi(vcpu, req))
> - __cpumask_set_cpu(cpu, tmp);
> + /*
> + * Note, the vCPU could get migrated to a different pCPU at any
> + * point after kvm_request_needs_ipi(), which could result in
> + * sending an IPI to the previous pCPU. But, that's ok because
> + * the purpose of the IPI is to ensure the vCPU returns to
> + * OUTSIDE_GUEST_MODE, which is satisfied if the vCPU migrates.
> + * Entering READING_SHADOW_PAGE_TABLES after this point is also
> + * ok, as the requirement is only that KVM wait for vCPUs that
> + * were reading SPTEs _before_ any changes were finalized. See
> + * kvm_vcpu_kick() for more details on handling requests.
> + */
> + if (tmp != NULL && kvm_request_needs_ipi(vcpu, req)) {
> + cpu = READ_ONCE(vcpu->cpu);
> + if (cpu != -1 && cpu != me)
> + __cpumask_set_cpu(cpu, tmp);
> + }
> }
>
> called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT));
> @@ -3309,16 +3321,24 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
> */
> void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
> {
> - int me;
> - int cpu = vcpu->cpu;
> + int me, cpu;
>
> if (kvm_vcpu_wake_up(vcpu))
> return;
>
> + /*
> + * Note, the vCPU could get migrated to a different pCPU at any point
> + * after kvm_arch_vcpu_should_kick(), which could result in sending an
> + * IPI to the previous pCPU. But, that's ok because the purpose of the
> + * IPI is to force the vCPU to leave IN_GUEST_MODE, and migrating the
> + * vCPU also requires it to leave IN_GUEST_MODE.
> + */
> me = get_cpu();
> - if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> - if (kvm_arch_vcpu_should_kick(vcpu))
> + if (kvm_arch_vcpu_should_kick(vcpu)) {
> + cpu = READ_ONCE(vcpu->cpu);
> + if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu))
> smp_send_reschedule(cpu);
> + }
> put_cpu();
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
> --
> 2.31.1
>