Re: [PATCH RFC 3/7] kvm: x86: XSAVE state and XFD MSRs context switch
From: Liu, Jing2
Date: Mon Feb 22 2021 - 03:38:37 EST
On 2/9/2021 2:12 AM, Paolo Bonzini wrote:
On 08/02/21 19:04, Sean Christopherson wrote:
That said, the case where we saw MSR autoload as faster involved
EFER, and
we decided that it was due to TLB flushes (commit f6577a5fa15d,
"x86, kvm,
vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you
know if
RDMSR/WRMSR is always slower than MSR autoload?
RDMSR/WRMSR may be marginally slower, but only because the autoload
stuff avoids
serializing the pipeline after every MSR.
That's probably adding up quickly...
The autoload paths are effectively
just wrappers around the WRMSR ucode, plus some extra VM-Enter
specific checks,
as ucode needs to perform all the normal fault checks on the index
and value.
On the flip side, if the load lists are dynamically constructed, I
suspect the
code overhead of walking the lists negates any advantages of the load
lists.
... but yeah this is not very encouraging.
Thanks for reviewing the patches.
Context switch time is a problem for XFD. In a VM that uses AMX, most
threads in the guest will have nonzero XFD but the vCPU thread itself
will have zero XFD. So as soon as one thread in the VM forces the
vCPU thread to clear XFD, you pay a price on all vmexits and vmentries.
Spec says,
"If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it saves bit i of XSTATE_BV field of the XSAVE header as 0
(indicating that the state component is in its initialized state).
With the exception of XSAVE, no data is saved for the state
component (XSAVE saves the initial value of the state component..."
Thus, the key point is not losing the non initial AMX state on vmexit
and vmenter. If AMX state is in initialized state, it doesn't matter.
Otherwise, XFD[i] should not be armed with a nonzero value.
If we don't want to extremely set XFD=0 every time on vmexit, it would
be useful to first detect if guest AMX state is initial or not.
How about using XINUSE notation here?
(Details in SDM vol1 13.6 PROCESSOR TRACKING OF
XSAVE-MANAGED STATE, and vol2 XRSTOR/XRSTORS instruction operation part)
The main idea is processor tracks the status of various state components
by XINUSE, and it shows if the state component is in use or not.
When XINUSE[i]=0, state component i is in initial configuration.
Otherwise, kvm should take care of XFD on vmexit.
However, running the host with _more_ bits set than necessary in XFD
should not be a problem as long as the host doesn't use the AMX
instructions.
Does "running the host" mean running in kvm? why need more bits
(host_XFD|guest_XFD),
I'm trying to think about the case that guest_XFD is not enough? e.g.
In guest, it only need bit i when guest supports it and guest uses
the passthru XFD[i] for detecting dynamic usage;
In kvm, kvm doesn't use AMX instructions; and "system software should not
use XFD to implement a 'lazy restore' approach to management of the
XTILEDATA
state component."
Out of kvm, kernel ensures setting correct XFD for threads when scheduling;
Thanks,
Jing
So perhaps Jing can look into keeping XFD=0 for as little time as
possible, and XFD=host_XFD|guest_XFD as much as possible.
Paolo