Re: [PATCH v3 1/2] KVM: x86: Check hypercall's exit to userspace generically

From: Huang, Kai
Date: Wed Oct 30 2024 - 20:49:43 EST



> @@ -10024,7 +10024,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>
> switch (nr) {
> case KVM_HC_VAPIC_POLL_IRQ:
> - ret = 0;
> + ret = 1;
> break;
> case KVM_HC_KICK_CPU:
> if (!guest_pv_has(vcpu, KVM_FEATURE_PV_UNHALT))
> @@ -10032,7 +10032,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>
> kvm_pv_kick_cpu_op(vcpu->kvm, a1);
> kvm_sched_yield(vcpu, a1);
> - ret = 0;
> + ret = 1;
> break;
> #ifdef CONFIG_X86_64
> case KVM_HC_CLOCK_PAIRING:
> @@ -10050,7 +10050,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> break;
>
> kvm_sched_yield(vcpu, a0);
> - ret = 0;
> + ret = 1;
> break;
> case KVM_HC_MAP_GPA_RANGE: {
> u64 gpa = a0, npages = a1, attrs = a2;
> @@ -10111,12 +10111,21 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> cpl = kvm_x86_call(get_cpl)(vcpu);
>
> ret = __kvm_emulate_hypercall(vcpu, nr, a0, a1, a2, a3, op_64_bit, cpl);
> - if (nr == KVM_HC_MAP_GPA_RANGE && !ret)
> - /* MAP_GPA tosses the request to the user space. */
> + if (!ret)
> return 0;
>
> - if (!op_64_bit)
> + /*
> + * KVM's ABI with the guest is that '0' is success, and any other value
> + * is an error code. Internally, '0' == exit to userspace (see above)
> + * and '1' == success, as KVM's de facto standard return codes are that
> + * plus -errno == failure. Explicitly check for '1' as some PV error
> + * codes are positive values.
> + */
> + if (ret == 1)
> + ret = 0;
> + else if (!op_64_bit)
> ret = (u32)ret;
> +
> kvm_rax_write(vcpu, ret);
>
> return kvm_skip_emulated_instruction(vcpu);
>

It appears below two cases are not covered correctly?

#ifdef CONFIG_X86_64
case KVM_HC_CLOCK_PAIRING:
ret = kvm_pv_clock_pairing(vcpu, a0, a1);
break;
#endif
case KVM_HC_SEND_IPI:
if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SEND_IPI))
break;

ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2, a3, op_64_bit);
break;

Looking at the code, seems at least for KVM_HC_CLOCK_PAIRING,
kvm_pv_clock_pairing() returns 0 on success, and the upstream behaviour is not
routing to userspace but writing 0 to vcpu's RAX and returning to guest. With
the change above, it immediately returns to userspace w/o writing 0 to RAX.

For KVM_HC_SEND_IPI, seems the upstream behaviour is the return value of
kvm_pv_send_ipi() is always written to vcpu's RAX reg, and we always just go
back to guest. With your change, the behaviour will be changed to:

1) when ret == 0, exit to userspace w/o writing 0 to vcpu's RAX,
2) when ret == 1, it is converted to 0 and then written to RAX.

This doesn't look like safe.