Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

From: Suthikulpanit, Suravee
Date: Wed Sep 14 2022 - 22:11:55 EST


Sean,

On 9/14/2022 2:42 AM, Sean Christopherson wrote:
On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
Hi Sean

On 9/2/2022 7:22 PM, Sean Christopherson wrote:
Disable the optimized APIC logical map if multiple vCPUs are aliased to
the same logical ID. Architecturally, all CPUs whose logical ID matches
the MDA are supposed to receive the interrupt; overwriting existing map
entries can result in missed IPIs.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
---
arch/x86/kvm/lapic.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6b2f538b8fd0..75748c380ceb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!mask)
continue;
- if (!is_power_of_2(mask)) {
+ ldr = ffs(mask) - 1;
+ if (!is_power_of_2(mask) || cluster[ldr]) {

Should this be checking if the cluster[ldr] is pointing to the same struct
apic instead? For example:

if (!is_power_of_2(mask) || cluster[ldr] != apic)

From my observation, the kvm_recalculate_apic_map() can be called many
times, and the cluster[ldr] could have already been assigned from the
previous invocation. So, as long as it is the same, it should be okay.

No, because cluster[ldr] can never match "apic". kvm_recalculate_apic_map()
creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
update of the current map.

Yes, the _new_ is getting created and initialized every time we call kvm_recalculate_apic_map(), and then passed into kvm_apic_map_get_logical_dest() along with the reference of cluster and mask to get populated based on the provided ldr. Please note that the new->phys_map[x2apic_id] is already assigned before the calling of kvm_apic_map_get_logical_dest(). Therefore, this check:

if (!is_power_of_2(mask) || cluster[ldr]) {
^here
will always fail, we are setting the new->logical_mode = KVM_APIC_MODE_MAP_DISABLED, which causing APICv/AVIC to always be inhibited.

I do not think this logic is correct. I also add debug printf to check, and I never see cluster[ldr] is NULL. Here is example of the debug printf just before the check above.

printk("DEBUG: vcpu_id=%u, logical_mode=%#x, mask=%#x, ldr=%#x, cluster[ldr]=%#llx, apic=%#llx\n", vcpu->vcpu_id, new->logical_mode, mask, ldr, (unsigned long long) cluster[ldr], (unsigned long long) apic);

DEBUG: vcpu_id=0, logical_mode=0x3, mask=0x1, ldr=0x0, cluster[ldr]=0xffff8f54fe7e8000, apic=0xffff8f54fe7e8000
DEBUG: vcpu_id=1, logical_mode=0x3, mask=0x2, ldr=0x1, cluster[ldr]=0xffff8f5475c28000, apic=0xffff8f5475c28000
DEBUG: vcpu_id=2, logical_mode=0x3, mask=0x4, ldr=0x2, cluster[ldr]=0xffff8f54ac488000, apic=0xffff8f54ac488000
DEBUG: vcpu_id=3, logical_mode=0x3, mask=0x8, ldr=0x3, cluster[ldr]=0xffff8f550dc34200, apic=0xffff8f550dc34200
DEBUG: vcpu_id=4, logical_mode=0x3, mask=0x10, ldr=0x4, cluster[ldr]=0xffff8f550dd08000, apic=0xffff8f550dd08000
.....
DEBUG: vcpu_id=15, logical_mode=0x3, mask=0x8000, ldr=0xf, cluster[ldr]=0xffff8f54ac488200, apic=0xffff8f54ac488200
DEBUG: vcpu_id=16, logical_mode=0x3, mask=0x1, ldr=0x0, cluster[ldr]=0xffff8f5531ff8000, apic=0xffff8f5531ff8000
DEBUG: vcpu_id=17, logical_mode=0x3, mask=0x2, ldr=0x1, cluster[ldr]=0xffff8f54e87c8200, apic=0xffff8f54e87c8200
DEBUG: vcpu_id=18, logical_mode=0x3, mask=0x4, ldr=0x2, cluster[ldr]=0xffff8f551c870200, apic=0xffff8f551c870200

Please, lemme know if I am missing something.

Best Regards,
Suravee

The loop containing this code is:

kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;

...
}

so it's impossible for cluster[ldr] to hold the current "apic", because this is
the first and only iteration that processes the current "apic".