Re: [PATCH v4 2/2] KVM: nVMX: Fix preemption timer bit set in vmcs02 even if L1 doesn't enable it

From: Wanpeng Li
Date: Fri Jul 08 2016 - 10:08:14 EST


2016-07-08 21:58 GMT+08:00 Wanpeng Li <kernellwp@xxxxxxxxx>:
> 2016-07-08 18:18 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>>
>>
>> On 08/07/2016 02:38, Wanpeng Li wrote:
>>> 2016-07-07 22:11 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>>>>
>>>>
>>>> On 07/07/2016 15:23, Wanpeng Li wrote:
>>>>>>>>>
>>>>>>>>> if (kvm_lapic_hv_timer_in_use(vcpu) &&
>>>>>>>>> + (is_guest_mode(vcpu) ||
>>>>>>>>> kvm_x86_ops->set_hv_timer(vcpu,
>>>>>>>>> - kvm_get_lapic_tscdeadline_msr(vcpu)))
>>>>>>>>> + kvm_get_lapic_tscdeadline_msr(vcpu))))
>>>>>>>>> kvm_lapic_switch_to_sw_timer(vcpu);
>>>>>>>>> if (check_tsc_unstable()) {
>>>>>>>>> u64 offset = kvm_compute_tsc_offset(vcpu,
>>>>>>>>>
>>>>>>>
>>>>>>> Thanks, this is good as a fallback. I'll try to fix it by getting the
>>>>>>> pin-based execution controls right but if I fail this patch is okay.
>>>>> I believe we still need this patch even if you implement "L1 TSC
>>>>> deadline timer to trigger while L2 is running" eventually, the codes
>>>>> you posted before:
>>>>>
>>>>> exec_control = vmcs12->pin_based_vm_exec_control;
>>>>> +exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>>>>> - exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>> + if (vmx->hv_deadline_tsc == -1)
>>>>> + exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>>>>>
>>>>> So there is still case the preemption timer bit of vmcs02 is not set,
>>>>> however, the scenario I mentioned above in kvm_arch_vcpu_load() will
>>>>> set it unnecessary.
>>>>
>>>> kvm_x86_ops->set_hv_timer _will_ set the preemption timer bit of vmcs02
>>>> if vmcs02 is the loaded one.
>>>>
>>>> This can happen if L2 has access to L1's local APIC registers (i.e. L1
>>>> passes the local APIC instead of emulating it, as is the case in a
>>>> partitioning hypervisor). While L2 runs, it writes to the TSC deadline
>>>> MSR of L1. This causes a call to kvm_x86_ops->set_hv_timer while the
>>>> active VMCS is a vmcs02.
>>>
>>> Yes, in the scenario you pointed out the call to
>>> kvm_x86_ops->set_hv_timer while the active VMCS is vmcs02 is correct,
>>> however, in the scenario I mentioned in the patch description is not
>>> correct even if enable "L1 TSC deadline timer to trigger while L2 is
>>> running".
>>
>> It doesn't help that you have not explained how to reproduce the
>> bug---this is what the cover letter and commit messages are for, too.
>
> I believe you pointed out this before:
>
> | The patch looks correct, but I'm not sure how you get a preemption
> | timer vmexit while vmcs02 is active:
> |
> | exec_control = vmcs12->pin_based_vm_exec_control;
> | exec_control |= vmcs_config.pin_based_exec_ctrl;
> | exec_control &= ~PIN_BASED_VMX_PREEMPTION_TIMER;
>
> We can't get preemption timer vmexit which vmcs02 is loaded since
> preemtion timer bit in vmcs02 is not set, then how can we get the
> incorrect preemption timer vmexit in nested guest which patch 1 fixed?
> Because the scenario I mentioned in patch 2 set vmcs02.
>
> w/o patch1 + w/o enable "L1 TSC deadline timer to trigger while L2 is
> running" => we will get the bug calltrace mentioned in patch1 since
> incorrect vmcs02 bit is set due to the bug mentioned in patch 2. So
> apply patch2 can fix it.
>
> However, after enable "L1 TSC deadline timer to trigger while L2 is
> running", we should have patch 1 to correctly capture nested vmexit
> for preemption timer.
>
> My setup is L0 and L1 both are latest kvm queue branch w/ Yunhong's
> preemption timer enable patches and my previous "fix missed
> cancellation of TSC deadline timer" patches. I always enable
> preemption timer in L0, but try to enable or disable preemption timer
> in L1.

Btw, my L1 is a full dynticks guest in order that hrtimer in L1 will
be heavily used.

Regards,
Wanpeng Li