-----Original Message-----
From: Yang Zhang [mailto:yang.zhang.wz@xxxxxxxxx]
Sent: Monday, December 21, 2015 10:01 AM
To: Wu, Feng <feng.wu@xxxxxxxxx>; pbonzini@xxxxxxxxxx;
rkrcmar@xxxxxxxxxx
Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts
On 2015/12/21 9:55, Wu, Feng wrote:
handle
-----Original Message-----
From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
owner@xxxxxxxxxxxxxxx] On Behalf Of Yang Zhang
Sent: Monday, December 21, 2015 9:50 AM
To: Wu, Feng <feng.wu@xxxxxxxxx>; pbonzini@xxxxxxxxxx;
rkrcmar@xxxxxxxxxx
Cc: kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts
On 2015/12/16 9:37, Feng Wu wrote:
Use vector-hashing to deliver lowest-priority interrupts for++++++++++++++++++++++++++++++++++++++++++++++++++++
VT-d posted-interrupts.
Signed-off-by: Feng Wu <feng.wu@xxxxxxxxx>
---
arch/x86/kvm/lapic.c | 67
arch/x86/kvm/lapic.h | 2 ++
arch/x86/kvm/vmx.c | 12 ++++++++--
3 files changed, 79 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e29001f..d4f2c8f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -854,6 +854,73 @@ out:
}
/*
+ * This routine handles lowest-priority interrupts using vector-hashing
+ * mechanism. As an example, modern Intel CPUs use this method to
right+ * lowest-priority interrupts.
+ *
+ * Here is the details about the vector-hashing mechanism:
+ * 1. For lowest-priority interrupts, store all the possible destination
+ * vCPUs in an array.
+ * 2. Use "guest vector % max number of destination vCPUs" to find the
*kvm,+ * destination vCPU in the array for the lowest-priority interrupt.struct kvm_lapic_irq *irq,
+ */
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq)
+{
+ struct kvm_apic_map *map;
+ struct kvm_vcpu *vcpu = NULL;
+
+ if (irq->shorthand)
+ return NULL;
+
+ rcu_read_lock();
+ map = rcu_dereference(kvm->arch.apic_map);
+
+ if (!map)
+ goto out;
+
+ if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
+ kvm_lowest_prio_delivery(irq)) {
+ u16 cid;
+ int i, idx = 0;
+ unsigned long bitmap = 1;
+ unsigned int dest_vcpus = 0;
+ struct kvm_lapic **dst = NULL;
+
+
+ if (!kvm_apic_logical_map_valid(map))
+ goto out;
+
+ apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
+
+ if (cid >= ARRAY_SIZE(map->logical_map))
+ goto out;
+
+ dst = map->logical_map[cid];
+
+ for_each_set_bit(i, &bitmap, 16) {
+ if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
+ clear_bit(i, &bitmap);
+ continue;
+ }
+ }
+
+ dest_vcpus = hweight16(bitmap);
+
+ if (dest_vcpus != 0) {
+ idx = kvm_vector_2_index(irq->vector, dest_vcpus,
+ &bitmap, 16);
+ vcpu = dst[idx-1]->vcpu;
+ }
+ }
+
+out:
+ rcu_read_unlock();
+ return vcpu;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
+
+/*
* Add a pending IRQ into lapic.
* Return 1 if successfully added and 0 if discarded.
*/
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6890ef0..52bffce 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
struct kvm_vcpu **dest_vcpu);
int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
const unsigned long *bitmap, u32 bitmap_size);
+struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
+ struct kvm_lapic_irq *irq);
#endif
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..3f89189 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm
unsigned int host_irq,
*/APIC_DM_LOWEST)
kvm_set_msi_irq(e, &irq);
- if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
- continue;
+
+ if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
+ if (!kvm_vector_hashing_enabled() ||
+ irq.delivery_mode !=
+ continue;
+
+ vcpu = kvm_intr_vector_hashing_dest(kvm, &irq);
+ if (!vcpu)
+ continue;
+ }
I am a little confused with the 'continue'. If the destination is not
single vcpu, shouldn't we rollback to use non-PI mode?
Here is the logic:
- If it is single destination, we will use PI no matter it is fixed or lowest-priority.
- If it is not single destination:
a) It is fixed, we will use non-PI
b) It is lowest-priority and vector-hashing is enabled, we will use PI
c) otherwise, use non-PI
If it is single destination previously, then change to no-single mode.
Can current code cover this case?
In my test, before setting irq affinity (change single vcpu to non-single vcpu
in this case), the guest will mask the interrupt first, so before getting here, IRTE
has been changed back to remapped mode already(when guest masks the MSIx,
we will change back to remapped mode), hence nothing needed here.
Digging into the linux code (guest) a bit more, I found that if interrupt remapping
is not enabled in the guest (IR is not supported for guest anyway), it will always
mask the MSI/MSIx before setting the irq affinity. So the code should work
well currently.
However, for robustness, I think explicitly changing IRTE back to remapped
mode for the 'continue' case should be a good idea.
Radim, Paolo, what are your guys' options about this? Any comments are
appreciated! Thanks a lot!
Thanks,
Feng