Re: [PATCH] KVM: x86: add hint to skip hidden rdpkru under kvm_load_host_xsave_state

From: Jon Kohler
Date: Sun May 16 2021 - 22:52:08 EST




> On May 14, 2021, at 1:11 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Fri, May 7, 2021 at 9:45 AM Jon Kohler <jon@xxxxxxxxxxx> wrote:
>>
>> kvm_load_host_xsave_state handles xsave on vm exit, part of which is
>> managing memory protection key state. The latest arch.pkru is updated
>> with a rdpkru, and if that doesn't match the base host_pkru (which
>> about 70% of the time), we issue a __write_pkru.
>
> This thread caused me to read the code, and I don't get how it's even
> remotely correct.
>
> First, kvm_load_guest_fpu() has this delight:
>
> /*
> * Guests with protected state can't have it set by the hypervisor,
> * so skip trying to set it.
> */
> if (vcpu->arch.guest_fpu)
> /* PKRU is separately restored in kvm_x86_ops.run. */
> __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
> ~XFEATURE_MASK_PKRU);
>
> That's nice, but it fails to restore XINUSE[PKRU]. As far as I know,
> that bit is live, and the only way to restore it to 0 is with
> XRSTOR(S).

Thanks, Andy - Adding Tom Lendacky to this thread as that
Particular snippet was last touched in ~5.11 in ed02b213098a.

>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cebdaa1e3cf5..cd95adbd140c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
>> }
>>
>> if (static_cpu_has(X86_FEATURE_PKU) &&
>> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
>> - vcpu->arch.pkru != vcpu->arch.host_pkru)
>> - __write_pkru(vcpu->arch.pkru);
>> + vcpu->arch.pkru != vcpu->arch.host_pkru &&
>> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
>> + __write_pkru(vcpu->arch.pkru, false);
>
> Please tell me I'm missing something (e.g. KVM very cleverly managing
> the PKRU register using intercepts) that makes this reliably load the
> guest value. An innocent or malicious guest could easily make that
> condition evaluate to false, thus allowing the host PKRU value to be
> live in guest mode. (Or is something fancy going on here?)
>
> I don't even want to think about what happens if a perf NMI hits and
> accesses host user memory while the guest PKRU is live (on VMX -- I
> think this can't happen on SVM).

Perhaps Babu / Dave can comment on the exiting logic here? Babu did
some additional work to fix save/restore on 37486135d3a.

From this changes perspective, I’m just trying to get to the resultant
evaluation a bit quicker, which this change (and the v3) seems to do
ok with; however, if I’ve ran my ship into a larger ice berg, happy to
collaborate to make it better overall.

>
>> }
>> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
>>
>> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
>> return;
>>
>> if (static_cpu_has(X86_FEATURE_PKU) &&
>> - (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
>> + ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
>> vcpu->arch.pkru = rdpkru();
>> if (vcpu->arch.pkru != vcpu->arch.host_pkru)
>> - __write_pkru(vcpu->arch.host_pkru);
>> + __write_pkru(vcpu->arch.host_pkru, true);
>> }
>
> Suppose the guest writes to PKRU and then, without exiting, sets PKE =
> 0 and XCR0[PKRU] = 0. (Or are the intercepts such that this can't
> happen except on SEV where maybe SEV magic makes the problem go away?)
>
> I admit I'm fairly mystified as to why KVM doesn't handle PKRU like
> the rest of guest XSTATE.

I’ll defer to Dave/Babu here. This code has been this way for a bit now,
I’m not sure at first glance if that situation could occur intentionally
or accidentally, but that would evaluate the same both before and
after this change, no?