Re: [PATCH RFC] KVM: nSVM: Fix L1 state corruption upon return from SMM

From: Paolo Bonzini
Date: Wed Jun 23 2021 - 09:21:57 EST


On 23/06/21 15:01, Maxim Levitsky wrote:
I did some homework on this now and I would like to share few my thoughts on this:

First of all my attention caught the way we intercept the #SMI
(this isn't 100% related to the bug but still worth talking about IMHO)

A. Bare metal: Looks like SVM allows to intercept SMI, with SVM_EXIT_SMI,
with an intention of then entering the BIOS SMM handler manually using the SMM_CTL msr.

... or just using STGI, which is what happens for KVM. This is in the manual: "The hypervisor may respond to the #VMEXIT(SMI) by executing the STGI instruction, which causes the pending SMI to be taken immediately".

It *should* work for KVM to just not intercept SMI, but it adds more complexity for no particular gain.

On bare metal we do set the INTERCEPT_SMI but we emulate the exit as a nop.
I guess on bare metal there are some undocumented bits that BIOS set which
make the CPU to ignore that SMI intercept and still take the #SMI handler,
normally but I wonder if we could still break some motherboard
code due to that.

B. Nested: If #SMI is intercepted, then it causes nested VMEXIT.
Since KVM does enable SMI intercept, when it runs nested it means that all SMIs
that nested KVM gets are emulated as NOP, and L1's SMI handler is not run.

No, this is incorrect. Note that svm_check_nested_events does not clear smi_pending the way vmx_check_nested_events does it for nmi_pending. So the interrupt is still there and will be injected on the next STGI.

Paolo


About the issue that was fixed in this patch. Let me try to understand how
it would work on bare metal:

1. A guest is entered. Host state is saved to VM_HSAVE_PA area (or stashed somewhere
in the CPU)

2. #SMI (without intercept) happens

3. CPU has to exit SVM, and start running the host SMI handler, it loads the SMM
state without touching the VM_HSAVE_PA runs the SMI handler, then once it RSMs,
it restores the guest state from SMM area and continues the guest

4. Once a normal VMexit happens, the host state is restored from VM_HSAVE_PA

So host state indeed can't be saved to VMC01.

I to be honest think would prefer not to use the L1's hsave area but rather add back our
'hsave' in KVM and store there the L1 host state on the nested entry always.

This way we will avoid touching the vmcb01 at all and both solve the issue and
reduce code complexity.
(copying of L1 host state to what basically is L1 guest state area and back
even has a comment to explain why it (was) possible to do so.
(before you discovered that this doesn't work with SMM).

Thanks again for fixing this bug!

Best regards,
Maxim Levitsky