Re: [PATCH v2 2/3] KVM: x86: Move pkru save/restore to x86.c

From: Jim Mattson
Date: Fri May 08 2020 - 18:10:08 EST


On Fri, May 8, 2020 at 2:10 PM Babu Moger <babu.moger@xxxxxxx> wrote:
>
> PKU feature is supported by both VMX and SVM. So we can
> safely move pkru state save/restore to common code.
> Also move all the pkru data structure to kvm_vcpu_arch.
>
> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmx.c | 18 ------------------
> arch/x86/kvm/x86.c | 20 ++++++++++++++++++++
> 3 files changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 42a2d0d3984a..afd8f3780ae0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -578,6 +578,7 @@ struct kvm_vcpu_arch {
> unsigned long cr4;
> unsigned long cr4_guest_owned_bits;
> unsigned long cr8;
> + u32 host_pkru;
> u32 pkru;
> u32 hflags;
> u64 efer;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c2c6335a998c..46898a476ba7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1372,7 +1372,6 @@ void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> vmx_vcpu_pi_load(vcpu, cpu);
>
> - vmx->host_pkru = read_pkru();
> vmx->host_debugctlmsr = get_debugctlmsr();
> }
>
> @@ -6577,11 +6576,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> kvm_load_guest_xsave_state(vcpu);
>
> - if (static_cpu_has(X86_FEATURE_PKU) &&
> - kvm_read_cr4_bits(vcpu, X86_CR4_PKE) &&
> - vcpu->arch.pkru != vmx->host_pkru)
> - __write_pkru(vcpu->arch.pkru);
> -
> pt_guest_enter(vmx);
>
> if (vcpu_to_pmu(vcpu)->version)
> @@ -6671,18 +6665,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>
> pt_guest_exit(vmx);
>
> - /*
> - * eager fpu is enabled if PKEY is supported and CR4 is switched
> - * back on host, so it is safe to read guest PKRU from current
> - * XSAVE.
> - */
> - if (static_cpu_has(X86_FEATURE_PKU) &&
> - kvm_read_cr4_bits(vcpu, X86_CR4_PKE)) {
> - vcpu->arch.pkru = rdpkru();
> - if (vcpu->arch.pkru != vmx->host_pkru)
> - __write_pkru(vmx->host_pkru);
> - }
> -
> kvm_load_host_xsave_state(vcpu);
>
> vmx->nested.nested_run_pending = 0;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5835f9cb9ad..1b27e78fb3c1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -836,11 +836,28 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
> vcpu->arch.ia32_xss != host_xss)
> wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
> }
> +
> + if (static_cpu_has(X86_FEATURE_PKU) &&
> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE) &&
> + vcpu->arch.pkru != vcpu->arch.host_pkru)
> + __write_pkru(vcpu->arch.pkru);

This doesn't seem quite right to me. Though rdpkru and wrpkru are
contingent upon CR4.PKE, the PKRU resource isn't. It can be read with
XSAVE and written with XRSTOR. So, if we don't set the guest PKRU
value here, the guest can read the host value, which seems dodgy at
best.

Perhaps the second conjunct should be: (kvm_read_cr4_bits(vcpu,
X86_CR4_PKE) || (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)).

> }
> EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
>
> void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
> {
> + /*
> + * eager fpu is enabled if PKEY is supported and CR4 is switched
> + * back on host, so it is safe to read guest PKRU from current
> + * XSAVE.
> + */

I don't understand the relevance of this comment to the code below.

> + if (static_cpu_has(X86_FEATURE_PKU) &&
> + kvm_read_cr4_bits(vcpu, X86_CR4_PKE)) {
> + vcpu->arch.pkru = rdpkru();
> + if (vcpu->arch.pkru != vcpu->arch.host_pkru)
> + __write_pkru(vcpu->arch.host_pkru);
> + }
> +

Same concern as above, but perhaps worse in this instance, since a
guest with CR4.PKE clear could potentially use XRSTOR to change the
host PKRU value.

> if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) {
>
> if (vcpu->arch.xcr0 != host_xcr0)
> @@ -3570,6 +3587,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> kvm_x86_ops.vcpu_load(vcpu, cpu);
>
> + /* Save host pkru register if supported */
> + vcpu->arch.host_pkru = read_pkru();
> +
> /* Apply any externally detected TSC adjustments (due to suspend) */
> if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
> adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
>