Re: Thoughts of AMX KVM support based on latest kernel

From: Thomas Gleixner
Date: Tue Nov 16 2021 - 13:56:02 EST


Sean,

On Tue, Nov 16 2021 at 16:05, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Thomas Gleixner wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2686f2edb47c..9425fdbb4806 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -9576,6 +9576,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> vcpu->arch.last_vmentry_cpu = vcpu->cpu;
>> vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>>
>> + kvm_update_guest_xfd_state();
>
> Is there a reason the XFD switch can't key off TIF_NEED_FPU_LOAD a la the other
> FPU stuff? I.e. piggyback this snippet in vcpu_enter_guest():

TIF_NEED_FPU_LOAD is not set here.

> if (test_thread_flag(TIF_NEED_FPU_LOAD))
> switch_fpu_return();

Assume guest has control of XFD and XFD writes are not trapped. That
means on vmexit the XFD state of the guest is unknown.

vcpu_run()
kvm_load_guest_fpu()
wrmsrl(XFD, guest_fpstate->xfd);
XRSTORS

do {

local_irq_disable();

// Covers the case of softirq usage and preemption
if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return()
wrmsrl(XFD, guest_fpstate->xfd);

do {
vmenter(); // Guest modifies XFD
} while (reenter);

local_irq_enable(); <- Problem starts here

preempt_enable(); <- Becomes wider here

} while (!breakout);

kvm_put_guest_fpu(); // Switch back to user FPU state

So we have the following cases:

guest_fpstate.xfd XFD at vmexit

0 0 // consistent state
1 0 // inconsistent state
0 1 // inconsistent state
1 1 // consistent state

Now assume that after reenabling interrupts a interrupt/softirq happens
which uses FPU. It will save the correct state because XFD is still
guest state, but the subsequent restore will operate on the stale
guest_fpstate.xfd value.

Same problem vs schedule after reenabling preemption or if not preempted
in kvm_put_guest_fpu()

Now you could argue that the interrupt/softirq XSAVES should also read
the XFD MSR and save it in guest_fpstate.xfd. Same in schedule()
and kvm_put_guest_fpu(), i.e:

XSAVES
if (fpstate->is_guest) {
rdmsrl(XFD, xfd);
fpstate->xfd = xfd;
__this_cpu_write(..., xfd);
}

We can do that, but I'm unhappy about this conditional in schedule(). So
I was asking for doing a simple KVM only solution first:

vcpu_run()
kvm_load_guest_fpu()
wrmsrl(XFD, guest_fpstate->xfd);
XRSTORS

do {

local_irq_disable();

if (test_thread_flag(TIF_NEED_FPU_LOAD))
switch_fpu_return()
wrmsrl(XFD, guest_fpstate->xfd);

do {
vmenter(); // Guest modifies XFD
} while (reenter);

update_xfd_state(); // Restore consistency

local_irq_enable();

and check how bad that is for KVM in terms of overhead on AMX systems.

If it really matters we can look at the conditional in XSAVES path.

Thanks,

tglx