Re: [RFC PATCH 05/35] KVM: SVM: Add initial support for SEV-ES GHCB access to KVM

From: Tom Lendacky
Date: Wed Sep 16 2020 - 13:38:53 EST


On 9/15/20 11:28 AM, Sean Christopherson wrote:
> On Tue, Sep 15, 2020 at 08:24:22AM -0500, Tom Lendacky wrote:
>> On 9/14/20 3:58 PM, Sean Christopherson wrote:
>>>> @@ -79,6 +88,9 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
>>>> if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
>>>> return;
>>>>
>>>> + if (kvm_x86_ops.reg_write_override)
>>>> + kvm_x86_ops.reg_write_override(vcpu, reg, val);
>>>
>>>
>>> There has to be a more optimal approach for propagating registers between
>>> vcpu->arch.regs and the VMSA than adding a per-GPR hook. Why not simply
>>> copy the entire set of registers to/from the VMSA on every exit and entry?
>>> AFAICT, valid_bits is only used in the read path, and KVM doesn't do anything
>>> sophistated when it hits a !valid_bits reads.
>>
>> That would probably be ok. And actually, the code might be able to just
>> check the GHCB valid bitmap for valid regs on exit, copy them and then
>> clear the bitmap. The write code could check if vmsa_encrypted is set and
>> then set a "valid" bit for the reg that could be used to set regs on entry.
>>
>> I'm not sure if turning kvm_vcpu_arch.regs into a struct and adding a
>> valid bit would be overkill or not.
>
> KVM already has space in regs_avail and regs_dirty for GPRs, they're just not
> used by the get/set helpers because they're always loaded/stored for both SVM
> and VMX.
>
> I assume nothing will break if KVM "writes" random GPRs in the VMSA? I can't
> see how the guest would achieve any level of security if it wantonly consumes
> GPRs, i.e. it's the guest's responsibility to consume only the relevant GPRs.

Right, the guest should only read the registers that it is expecting to be
provided by the hypervisor as set forth in the GHCB spec. It shouldn't
load any other registers that the hypervisor provides. The Linux SEV-ES
guest support follows this model and will only load the registers that are
specified via the GHCB spec for a particular NAE event, ignoring anything
else provided.

>
> If that holds true, than avoiding the copying isn't functionally necessary, and
> is really just a performance optimization. One potentially crazy idea would be
> to change vcpu->arch.regs to be a pointer (defaults a __regs array), and then
> have SEV-ES switch it to point directly at the VMSA array (I think the layout
> is identical for x86-64?).

That would be nice, but it isn't quite laid out like that. Before SEV-ES
support, RAX and RSP were the only GPRs saved. With the arrival of SEV-ES,
the remaining registers were added to the VMSA, but a number of bytes
after RAX and RSP. So right now, there are reserved areas where RAX and
RSP would have been at the new register block in the VMSA (see offset
0x300 in the VMSA layout of the APM volume 2,
https://www.amd.com/system/files/TechDocs/24593.pdf).

I might be able to move the RAX and RSP values before the VMSA is
encrypted (or the GHCB returned), assuming those fields would stay
reserved, but I don't think that can be guaranteed.

Let me see if I can put something together using regs_avail and regs_dirty.

>
>>>> @@ -4012,6 +4052,99 @@ static bool svm_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
>>>> (svm->vmcb->control.intercept & (1ULL << INTERCEPT_INIT));
>>>> }
>>>>
>>>> +/*
>>>> + * These return values represent the offset in quad words within the VM save
>>>> + * area. This allows them to be accessed by casting the save area to a u64
>>>> + * array.
>>>> + */
>>>> +#define VMSA_REG_ENTRY(_field) (offsetof(struct vmcb_save_area, _field) / sizeof(u64))
>>>> +#define VMSA_REG_UNDEF VMSA_REG_ENTRY(valid_bitmap)
>>>> +static inline unsigned int vcpu_to_vmsa_entry(enum kvm_reg reg)
>>>> +{
>>>> + switch (reg) {
>>>> + case VCPU_REGS_RAX: return VMSA_REG_ENTRY(rax);
>>>> + case VCPU_REGS_RBX: return VMSA_REG_ENTRY(rbx);
>>>> + case VCPU_REGS_RCX: return VMSA_REG_ENTRY(rcx);
>>>> + case VCPU_REGS_RDX: return VMSA_REG_ENTRY(rdx);
>>>> + case VCPU_REGS_RSP: return VMSA_REG_ENTRY(rsp);
>>>> + case VCPU_REGS_RBP: return VMSA_REG_ENTRY(rbp);
>>>> + case VCPU_REGS_RSI: return VMSA_REG_ENTRY(rsi);
>>>> + case VCPU_REGS_RDI: return VMSA_REG_ENTRY(rdi);
>>>> +#ifdef CONFIG_X86_64
>
> Is KVM SEV-ES going to support 32-bit builds?

No, SEV-ES won't support 32-bit builds and since those fields are always
defined, I can just remove this #ifdef.

>
>>>> + case VCPU_REGS_R8: return VMSA_REG_ENTRY(r8);
>>>> + case VCPU_REGS_R9: return VMSA_REG_ENTRY(r9);
>>>> + case VCPU_REGS_R10: return VMSA_REG_ENTRY(r10);
>>>> + case VCPU_REGS_R11: return VMSA_REG_ENTRY(r11);
>>>> + case VCPU_REGS_R12: return VMSA_REG_ENTRY(r12);
>>>> + case VCPU_REGS_R13: return VMSA_REG_ENTRY(r13);
>>>> + case VCPU_REGS_R14: return VMSA_REG_ENTRY(r14);
>>>> + case VCPU_REGS_R15: return VMSA_REG_ENTRY(r15);
>>>> +#endif
>>>> + case VCPU_REGS_RIP: return VMSA_REG_ENTRY(rip);
>>>> + default:
>>>> + WARN_ONCE(1, "unsupported VCPU to VMSA register conversion\n");
>>>> + return VMSA_REG_UNDEF;
>>>> + }
>>>> +}
>>>> +
>>>> +/* For SEV-ES guests, populate the vCPU register from the appropriate VMSA/GHCB */
>>>> +static void svm_reg_read_override(struct kvm_vcpu *vcpu, enum kvm_reg reg)
>>>> +{
>>>> + struct vmcb_save_area *vmsa;
>>>> + struct vcpu_svm *svm;
>>>> + unsigned int entry;
>>>> + unsigned long val;
>>>> + u64 *vmsa_reg;
>>>> +
>>>> + if (!sev_es_guest(vcpu->kvm))
>>>> + return;
>>>> +
>>>> + entry = vcpu_to_vmsa_entry(reg);
>>>> + if (entry == VMSA_REG_UNDEF)
>>>> + return;
>>>> +
>>>> + svm = to_svm(vcpu);
>>>> + vmsa = get_vmsa(svm);
>>>> + vmsa_reg = (u64 *)vmsa;
>>>> + val = (unsigned long)vmsa_reg[entry];
>>>> +
>>>> + /* If a GHCB is mapped, check the bitmap of valid entries */
>>>> + if (svm->ghcb) {
>>>> + if (!test_bit(entry, (unsigned long *)vmsa->valid_bitmap))
>>>> + val = 0;
>>>
>>> Is KVM relying on this being 0? Would it make sense to stuff something like
>>> 0xaaaa... or 0xdeadbeefdeadbeef so that consumption of bogus data is more
>>> noticeable?
>>
>> No, KVM isn't relying on this being 0. I thought about using something
>> other than 0 here, but settled on just using 0. I'm open to changing that,
>> though. I'm not sure if there's an easy way to short-circuit the intercept
>> and respond back with an error at this point, that would be optimal.
>
> Ya, responding with an error would be ideal. At this point, we're taking the
> same lazy approach for TDX and effectively consuming garbage if the guest
> requests emulation but doesn't expose the necessary GPRs. That being said,
> TDX's guest/host ABI is quite rigid, so all the "is this register valid"
> checks could be hardcoded into the higher level "emulation" flows.
>
> Would that also be an option for SEV-ES?

Meaning adding the expected input checks at VMEXIT time in the VMGEXIT
handler, so that accesses later are guaranteed to be good? That is an
option and might also address one of the other points you brought up about
about receiving exits that are not supported/expected.

Thanks,
Tom

>