Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning

From: Liran Alon
Date: Mon Dec 23 2019 - 17:51:21 EST




> On 23 Dec 2019, at 19:46, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 23/12/19 18:28, Liran Alon wrote:
>>>> Why reset CR pinned MSRs by userspace instead of KVM INIT
>>>> handling?
>>> Most MSRs are not reset by INIT, are they?
>>>
>>> Paolo
>>>
>> MSR_KVM_SYSTEM_TIME saved in vcpu->arch.time is reset at
>> kvmclock_reset() which is called by kvm_vcpu_reset() (KVM INIT
>> handler). In addition, vmx_vcpu_reset(), called from
>> kvm_vcpu_reset(), also resets multiple MSRs such as:
>> MSR_IA32_SPEC_CTRL (vmx->spec_ctrl) and MSR_IA32_UMWAIT_CONTROL
>> (msr_ia32_umwait_control).
>
> These probably can be removed, since they are zero at startup and at
> least SPEC_CTRL is documented[1] to be unaffected by INIT. However, I
> couldn't find information about UMWAIT_CONTROL.
>
>> Having said that, I see indeed that most of MSRs are being set by
>> QEMU in kvm_put_msrs() when level >= KVM_PUT_RESET_STATE. When is
>> triggered by qemu_system_reset() -> cpu_synchronize_all_post_reset ->
>> cpu_synchronize_post_reset() -> kvm_cpu_synchronize_post_reset().
>>
>> So given current design, OK I agree with you that CR pinned MSRs
>> should be zeroed by userspace VMM.
>>
>> It does though seems kinda weird to me that part of CPU state is
>> initialised on KVM INIT handler and part of it in userspace VMM. It
>> could lead to inconsistent (i.e. diverging from spec) CPU behaviour.
>
> The reason for that is the even on real hardware INIT does not touch
> most MSRs:
>
> 9.1 Initialization overview
>
> Asserting the INIT# pin on the processor invokes a similar response to
> a hardware reset. The major difference is that during an INIT, the
> internal caches, MSRs, MTRRs, and x87 FPU state are left unchanged
> (although, the TLBs and BTB are invalidated as with a hardware reset).
> An INIT provides a method for switching from protected to real-address
> mode while maintaining the contents of the internal caches.
>
> Paolo
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__software.intel.com_security-2Dsoftware-2Dguidance_api-2Dapp_sites_default_files_336996-2DSpeculative-2DExecution-2DSide-2DChannel-2DMitigations.pdf&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=MRhvCoZcvqLvHMhI3y3iSkpiAfitrNoCgPtWaTXhzNQ&s=66DAue_Ojp-a_msxZ2W9bLtQWJB4W1p9F9GrBBQ8dnw&e=
>

Oh right. Thatâs true.
There is a difference between power-up and asserting RESET# pin to asserting INIT# pin or receiving INIT IPI.
The first does reset all the processor state while the latter reset only part of it as indeed Intel SDM section 9.1 describes.

So basically because KVM is only aware of INIT IPI but not on power-up and asserting RESET# pin events,
then the job of emulating the latter events is currently implemented in userspace VMM.

Having said that, one could argue that because KVM implements the vCPU, it should just expose a relevant ioctl
for userspace VMM to trigger âfull vCPU state resetâ on power-up or asserting RESET# pin events.
i.e. Making userspace VMM only the one which emulates the peripheral hardware that triggers the event,
but not implementing the vCPU emulation that responds to this event. I think this would have resulted in a cleaner design.

But we are diverting from original patch-series discussion so if we want to discuss this further, letâs open a separate email thread on that. :)

-Liran