Re: [PATCH 03/10] KVM: SVM: Terminate the VM if a SEV-ES+ guest is run with an invalid VMSA

From: Tom Lendacky
Date: Mon Feb 24 2025 - 19:00:39 EST


On 2/24/25 16:55, Sean Christopherson wrote:
> On Mon, Feb 24, 2025, Tom Lendacky wrote:
>> On 2/18/25 19:26, Sean Christopherson wrote:
>>> -void pre_sev_run(struct vcpu_svm *svm, int cpu)
>>> +int pre_sev_run(struct vcpu_svm *svm, int cpu)
>>> {
>>> struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
>>> - unsigned int asid = sev_get_asid(svm->vcpu.kvm);
>>> + struct kvm *kvm = svm->vcpu.kvm;
>>> + unsigned int asid = sev_get_asid(kvm);
>>> +
>>> + /*
>>> + * Terminate the VM if userspace attempts to run the vCPU with an
>>> + * invalid VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after
>>> + * an SNP AP Destroy event.
>>> + */
>>> + if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa)) {
>>> + kvm_vm_dead(kvm);
>>> + return -EIO;
>>> + }
>>
>> If a VMRUN is performed with the vmsa_pa value set to INVALID_PAGE, the
>> VMRUN will fail and KVM will dump the VMCB and exit back to userspace
>
> I haven't tested, but based on what the APM says, I'm pretty sure this would crash
> the host due to a #GP on VMRUN, i.e. due to the resulting kvm_spurious_fault().
>
> IF (rAX contains an unsupported physical address)
> EXCEPTION [#GP]

Well that's for the VMCB, the VMSA is pointed to by the VMCB and results
in a VMEXIT code of -1 if you don't supply a proper page-aligned,
physical address.

>
>> with KVM_EXIT_INTERNAL_ERROR.
>>
>> Is doing this preferrable to that?
>
> Even if AMD guaranteed that the absolute worst case scenario is a failed VMRUN
> with zero side effects, doing VMRUN with a bad address should be treated as a
> KVM bug.

Fair.

>
>> If so, should a vcpu_unimpl() message be issued, too, to better identify the
>> reason for marking the VM dead?
>
> My vote is no. At some point we need to assume userspace possesess a reasonable
> level of competency and sanity.
>
>>> static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>>> @@ -4231,7 +4233,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>>> if (force_immediate_exit)
>>> smp_send_reschedule(vcpu->cpu);
>>>
>>> - pre_svm_run(vcpu);
>>> + if (pre_svm_run(vcpu))
>>> + return EXIT_FASTPATH_EXIT_USERSPACE;

In testing this out, I think userspace continues on because I eventually
get:

KVM_GET_PIT2 failed: Input/output error
/tmp/cmdline.98112: line 1: 98163 Aborted (core dumped) ...

Haven't looked too close, but maybe an exit_reason needs to be set to
get qemu to quit sooner?

Thanks,
Tom


>>
>> Since the return code from pre_svm_run() is never used, should it just
>> be a bool function, then?
>
> Hard no. I strongly dislike boolean returns for functions that aren't obviously
> predicates, because it's impossible to determine the polarity of the return value
> based solely on the prototype. This leads to bugs that are easier to detect with
> 0/-errno return, e.g. returning -EINVAL in a happy path stands out more than
> returning the wrong false/true value.
>
> Case in point (because I just responded to another emain about this function),
> what's the polarity of this helper? :-)
>
> static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries,
> __u32 num_entries, unsigned int ioctl_type)