Re: [PATCH 2/2] KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN

From: Sean Christopherson
Date: Thu Jan 13 2022 - 17:33:49 EST


On Mon, Jan 03, 2022, Igor Mammedov wrote:
> On Mon, 03 Jan 2022 09:04:29 +0100
> Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>
> > Paolo Bonzini <pbonzini@xxxxxxxxxx> writes:
> >
> > > On 12/27/21 18:32, Igor Mammedov wrote:
> > >>> Tweaked and queued nevertheless, thanks.
> > >> it seems this patch breaks VCPU hotplug, in scenario:
> > >>
> > >> 1. hotunplug existing VCPU (QEMU stores VCPU file descriptor in parked cpus list)
> > >> 2. hotplug it again (unsuspecting QEMU reuses stored file descriptor when recreating VCPU)
> > >>
> > >> RHBZ:https://bugzilla.redhat.com/show_bug.cgi?id=2028337#c11
> > >>
> > >
> > > The fix here would be (in QEMU) to not call KVM_SET_CPUID2 again.
> > > However, we need to work around it in KVM, and allow KVM_SET_CPUID2 if
> > > the data passed to the ioctl is the same that was set before.
> >
> > Are we sure the data is going to be *exactly* the same? In particular,
> > when using vCPU fds from the parked list, do we keep the same
> > APIC/x2APIC id when hotplugging? Or can we actually hotplug with a
> > different id?
>
> If I recall it right, it can be a different ID easily.

No, it cannot. KVM doesn't provide a way for userspace to change the APIC ID of
a vCPU after the vCPU is created. x2APIC flat out disallows changing the APIC ID,
and unless there's magic I'm missing, apic_mmio_write() => kvm_lapic_reg_write()
is not reachable from userspace.

The only way for userspace to set the APIC ID is to change vcpu->vcpu_id, and that
can only be done at KVM_VCPU_CREATE.

So, reusing a parked vCPU for hotplug must reuse the same APIC ID. QEMU handles
this by stashing the vcpu_id, a.k.a. APIC ID, when parking a vCPU, and reuses a
parked vCPU if and only if it has the same APIC ID. And because QEMU derives the
APIC ID from topology, that means all the topology CPUID leafs must remain the
same, otherwise the guest is hosed because it will send IPIs to the wrong vCPUs.

static int do_kvm_destroy_vcpu(CPUState *cpu)
{
struct KVMParkedVcpu *vcpu = NULL;

...

vcpu = g_malloc0(sizeof(*vcpu));
vcpu->vcpu_id = kvm_arch_vcpu_id(cpu); <=== stash the APIC ID when parking
vcpu->kvm_fd = cpu->kvm_fd;
QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
err:
return ret;
}

static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
{
struct KVMParkedVcpu *cpu;

QLIST_FOREACH(cpu, &s->kvm_parked_vcpus, node) {
if (cpu->vcpu_id == vcpu_id) { <=== reuse if APIC ID matches
int kvm_fd;

QLIST_REMOVE(cpu, node);
kvm_fd = cpu->kvm_fd;
g_free(cpu);
return kvm_fd;
}
}

return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
}