Re: [PATCH v5 5/8] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit

From: Zeng Guang
Date: Fri Jan 14 2022 - 02:52:51 EST


On 1/14/2022 5:29 AM, Sean Christopherson wrote:
On Fri, Dec 31, 2021, Zeng Guang wrote:
In VMX non-root operation, new behavior applies to
"new behavior" is ambiguous, it's not clear if it refers to new hardware behavior,
new KVM behavior, etc...

virtualize WRMSR to vICR in x2APIC mode. Depending
Please wrap at ~75 chars, this is too narrow.

on settings of the VM-execution controls, CPU would
produce APIC-write VM-exit following the 64-bit value
written to offset 300H on the virtual-APIC page(vICR).
KVM needs to retrieve the value written by CPU and
emulate the vICR write to deliver an interrupt.

Current KVM doesn't consider to handle the 64-bit setting
on vICR in trap-like APIC-write VM-exit. Because using
kvm_lapic_reg_write() to emulate writes to APIC_ICR requires
the APIC_ICR2 is already programmed correctly. But in the
above APIC-write VM-exit, CPU writes the whole 64 bits to
APIC_ICR rather than program higher 32 bits and lower 32
bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs
to retrieve the whole 64-bit value and program higher 32 bits
to APIC_ICR2 first.
I think this is simply saying:

Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
the WRMSR. Add support for handling "nodecode" x2APIC writes, which were
previously impossible.

Note, x2APIC MSR writes are 64 bits wide.

and then the shortlog can be:

KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode

The "interrupt dispatch" part is quite confusing because it's not really germane
to the change; yes, the vICR write does (eventually) dispatch an IRQ, but that
has nothing to do with the code being modified.

I would take commit message as you suggested. Thanks.

Signed-off-by: Zeng Guang <guang.zeng@xxxxxxxxx>
---
arch/x86/kvm/lapic.c | 12 +++++++++---
arch/x86/kvm/lapic.h | 5 +++++
2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f206fc35deff..3ce7142ba00e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2186,15 +2186,21 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
/* emulate APIC access in a trap manner */
void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
{
- u32 val = 0;
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ u64 val = 0;
/* hw has done the conditional check and inst decode */
offset &= 0xff0;
- kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
+ /* exception dealing with 64bit data on vICR in x2apic mode */
+ if ((offset == APIC_ICR) && apic_x2apic_mode(apic)) {
Sorry, I failed to reply to your response in the previous version. I suggested
a WARN_ON(offset != APIC_ICR), but you were concerned that apic_x2apic_mode()
would be expensive to check before @offset. I don't think that's a valid concern
as apic_x2apic_mode() is simply:

apic->vcpu->arch.apic_base & X2APIC_ENABLE

And is likely well-predicted by the CPU, especially in single tenant or pinned
scenarios where the pCPU is running a single VM/vCPU, i.e. will amost never see
X2APIC_ENABLE toggling.

So I stand behind my previous feedback[*] that we should split on x2APIC.

+ val = kvm_lapic_get_reg64(apic, offset);
+ kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32));
+ } else
+ kvm_lapic_reg_read(apic, offset, 4, &val);
Needs curly braces. But again, I stand behind my previous feedback that this
would be better written as:

if (apic_x2apic_mode(apic)) {
if (WARN_ON_ONCE(offset != APIC_ICR))
return 1;

kvm_lapic_reg_read(apic, offset, 8, &val);
kvm_lapic_reg_write64(apic, offset, val);
} else {
kvm_lapic_reg_read(apic, offset, 4, &val);
kvm_lapic_reg_write(apic, offset, val);
}

after a patch (provided in earlier feedback) to introduce kvm_lapic_reg_write64().

[*] https://lore.kernel.org/all/YTvcJZSd1KQvNmaz@xxxxxxxxxx

kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to support 64bit
read. And another concern is here getting reg value only specific from vICR(no other regs
need take care), going through whole path on kvm_lapic_reg_read() could be time-consuming
unnecessarily. Is it proper that calling kvm_lapic_get_reg64() to retrieve vICR value directly?

The change could be like follows:

if (apic_x2apic_mode(apic)) {
if (WARN_ON_ONCE(offset != APIC_ICR))
return 1;

val = kvm_lapic_get_reg64(apic, offset);
kvm_lapic_reg_write64(apic, offset, val);
} else {
kvm_lapic_reg_read(apic, offset, 4, &val);
kvm_lapic_reg_write(apic, offset, val);
}


/* TODO: optimize to just emulate side effect w/o one more write */
- kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
+ kvm_lapic_reg_write(apic, offset, (u32)val);
}
EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);