Re: [PATCH v2] KVM: halt-polling: poll if emulated lapic timer will fire soon

From: David Matlack
Date: Fri May 20 2016 - 14:37:28 EST


On Thu, May 19, 2016 at 7:04 PM, Yang Zhang <yang.zhang.wz@xxxxxxxxx> wrote:
> On 2016/5/20 2:36, David Matlack wrote:
>>
>> On Thu, May 19, 2016 at 11:01 AM, David Matlack <dmatlack@xxxxxxxxxx>
>> wrote:
>>>
>>> On Thu, May 19, 2016 at 6:27 AM, Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>>>>
>>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>>
>>>> If an emulated lapic timer will fire soon(in the scope of 10us the
>>>> base of dynamic halt-polling, lower-end of message passing workload
>>>> latency TCP_RR's poll time < 10us) we can treat it as a short halt,
>>>> and poll to wait it fire, the fire callback apic_timer_fn() will set
>>>> KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll.
>>>> This can avoid context switch overhead and the latency which we wake
>>>> up vCPU.
>>>
>>>
>>> If I understand correctly, your patch aims to reduce the latency of
>>> (APIC Timer expires) -> (Guest resumes execution) using halt-polling.
>>> Let me know if I'm misunderstanding.
>>>
>>> In general, I don't think it makes sense to poll for timer interrupts.
>>> We know when the timer interrupt is going to arrive. If we care about
>>> the latency of delivering that interrupt to the guest, we should
>>> program the hrtimer to wake us up slightly early, and then deliver the
>>> virtual timer interrupt right on time (I think KVM's TSC Deadline
>>> Timer emulation already does this).
>>
>>
>> (It looks like the way to enable this feature is to set the module
>> parameter lapic_timer_advance_ns and make sure your guest is using the
>> TSC Deadline timer instead of the APIC Timer.)
>
>
> This feature is slightly different from current advance expiration way.
> Advance expiration rely on the VCPU is running(do polling before vmentry).
> But in some cases, the timer interrupt may be blocked by other thread(i.e.,
> IF bit is clear) and VCPU cannot be scheduled to run immediately. So even
> advance the timer early, VCPU may still see the latency. But polling is
> different, it ensures the VCPU to aware the timer expiration before schedule
> out.
>
>>
>>> I'm curious to know if this scheme
>>> would give the same performance improvement to iperf as your patch.
>>>
>>> We discussed this a bit before on the mailing list before
>>> (https://lkml.org/lkml/2016/3/29/680). I'd like to see halt-polling
>>> and timer interrupts go in the opposite direction: if the next timer
>>> event (from any timer) is less than vcpu->halt_poll_ns, don't poll at
>>> all.
>>>
>>>>
>>>> iperf TCP get ~6% bandwidth improvement.
>>>
>>>
>>> Can you explain why your patch results in this bandwidth improvement?
>
>
> It should be reasonable. I have seen the same improvement with ctx switch
> benchmark: The latency is reduce from ~2600ns to ~2300ns with the similar
> mechanism.(The same idea but different implementation)

It's not obvious to me why polling for a timer interrupt would improve
context switch latency. Can you explain a bit more?

>
>>>
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>>>> Cc: David Matlack <dmatlack@xxxxxxxxxx>
>>>> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxx>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>> ---
>>>> v1 -> v2:
>>>> * add return statement to non-x86 archs
>>>> * capture never expire case for x86 (hrtimer is not started)
>>>>
>>>> arch/arm/include/asm/kvm_host.h | 4 ++++
>>>> arch/arm64/include/asm/kvm_host.h | 4 ++++
>>>> arch/mips/include/asm/kvm_host.h | 4 ++++
>>>> arch/powerpc/include/asm/kvm_host.h | 4 ++++
>>>> arch/s390/include/asm/kvm_host.h | 4 ++++
>>>> arch/x86/kvm/lapic.c | 11 +++++++++++
>>>> arch/x86/kvm/lapic.h | 1 +
>>>> arch/x86/kvm/x86.c | 5 +++++
>>>> include/linux/kvm_host.h | 1 +
>>>> virt/kvm/kvm_main.c | 14 ++++++++++----
>>>> 10 files changed, 48 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_host.h
>>>> b/arch/arm/include/asm/kvm_host.h
>>>> index 4cd8732..a5fd858 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -284,6 +284,10 @@ static inline void kvm_arch_sync_events(struct kvm
>>>> *kvm) {}
>>>> static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return -1ULL;
>>>> +}
>>>>
>>>> static inline void kvm_arm_init_debug(void) {}
>>>> static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index d49399d..94e227a 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -359,6 +359,10 @@ static inline void kvm_arch_sync_events(struct kvm
>>>> *kvm) {}
>>>> static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return -1ULL;
>>>> +}
>>>>
>>>> void kvm_arm_init_debug(void);
>>>> void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/mips/include/asm/kvm_host.h
>>>> b/arch/mips/include/asm/kvm_host.h
>>>> index 9a37a10..456bc42 100644
>>>> --- a/arch/mips/include/asm/kvm_host.h
>>>> +++ b/arch/mips/include/asm/kvm_host.h
>>>> @@ -813,6 +813,10 @@ static inline void kvm_arch_vcpu_uninit(struct
>>>> kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>>>> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return -1ULL;
>>>> +}
>>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>>
>>>> #endif /* __MIPS_KVM_HOST_H__ */
>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h
>>>> b/arch/powerpc/include/asm/kvm_host.h
>>>> index ec35af3..5986c79 100644
>>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>>> @@ -729,5 +729,9 @@ static inline void kvm_arch_exit(void) {}
>>>> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return -1ULL;
>>>> +}
>>>>
>>>> #endif /* __POWERPC_KVM_HOST_H__ */
>>>> diff --git a/arch/s390/include/asm/kvm_host.h
>>>> b/arch/s390/include/asm/kvm_host.h
>>>> index 37b9017..bdb01a1 100644
>>>> --- a/arch/s390/include/asm/kvm_host.h
>>>> +++ b/arch/s390/include/asm/kvm_host.h
>>>> @@ -696,6 +696,10 @@ static inline void
>>>> kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>>> struct kvm_memory_slot *slot) {}
>>>> static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>>> static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>>> +static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return -1ULL;
>>>> +}
>>>>
>>>> void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu);
>>>>
>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>> index bbb5b28..cfeeac3 100644
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -256,6 +256,17 @@ static inline int apic_lvtt_tscdeadline(struct
>>>> kvm_lapic *apic)
>>>> return apic->lapic_timer.timer_mode ==
>>>> APIC_LVT_TIMER_TSCDEADLINE;
>>>> }
>>>>
>>>> +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + struct kvm_lapic *apic = vcpu->arch.apic;
>>>> + struct hrtimer *timer = &apic->lapic_timer.timer;
>>>> +
>>>> + if (!hrtimer_active(timer))
>>>> + return -1ULL;
>>>> + else
>>>> + return ktime_to_ns(hrtimer_get_remaining(timer));
>>>> +}
>>>> +
>>>> static inline int apic_lvt_nmi_mode(u32 lvt_val)
>>>> {
>>>> return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) ==
>>>> APIC_DM_NMI;
>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>> index 891c6da..ee4da6c 100644
>>>> --- a/arch/x86/kvm/lapic.h
>>>> +++ b/arch/x86/kvm/lapic.h
>>>> @@ -212,4 +212,5 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
>>>> struct kvm_lapic_irq *irq,
>>>> struct kvm_vcpu **dest_vcpu);
>>>> int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
>>>> const unsigned long *bitmap, u32 bitmap_size);
>>>> +u64 apic_get_timer_expire(struct kvm_vcpu *vcpu);
>>>> #endif
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index a8c7ca3..9b5ad99 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -7623,6 +7623,11 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
>>>> struct static_key kvm_no_apic_vcpu __read_mostly;
>>>> EXPORT_SYMBOL_GPL(kvm_no_apic_vcpu);
>>>>
>>>> +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + return apic_get_timer_expire(vcpu);
>>>> +}
>>>> +
>>>> int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>> {
>>>> struct page *page;
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index b1fa8f1..14d6c23 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -663,6 +663,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>>>> void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
>>>> void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>>>> void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>>>> +u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu);
>>>>
>>>> void kvm_flush_remote_tlbs(struct kvm *kvm);
>>>> void kvm_reload_remote_mmus(struct kvm *kvm);
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index dd4ac9d..e4bb30b 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -78,6 +78,9 @@ module_param(halt_poll_ns_grow, uint, S_IRUGO |
>>>> S_IWUSR);
>>>> static unsigned int halt_poll_ns_shrink;
>>>> module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR);
>>>>
>>>> +/* lower-end of message passing workload latency TCP_RR's poll time <
>>>> 10us */
>>>> +static unsigned int halt_poll_ns_base = 10000;
>>>> +
>>>> /*
>>>> * Ordering of locks:
>>>> *
>>>> @@ -1966,7 +1969,7 @@ static void grow_halt_poll_ns(struct kvm_vcpu
>>>> *vcpu)
>>>> grow = READ_ONCE(halt_poll_ns_grow);
>>>> /* 10us base */
>>>> if (val == 0 && grow)
>>>> - val = 10000;
>>>> + val = halt_poll_ns_base;
>>>> else
>>>> val *= grow;
>>>>
>>>> @@ -2014,12 +2017,15 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>>>> ktime_t start, cur;
>>>> DECLARE_SWAITQUEUE(wait);
>>>> bool waited = false;
>>>> - u64 block_ns;
>>>> + u64 block_ns, delta, remaining;
>>>>
>>>> + remaining = kvm_arch_timer_remaining(vcpu);
>>>> start = cur = ktime_get();
>>>> - if (vcpu->halt_poll_ns) {
>>>> - ktime_t stop = ktime_add_ns(ktime_get(),
>>>> vcpu->halt_poll_ns);
>>>> + if (vcpu->halt_poll_ns || remaining < halt_poll_ns_base) {
>>>> + ktime_t stop;
>>>>
>>>> + delta = vcpu->halt_poll_ns ? vcpu->halt_poll_ns :
>>>> remaining;
>>>> + stop = ktime_add_ns(ktime_get(), delta);
>>>> ++vcpu->stat.halt_attempted_poll;
>>>> do {
>>>> /*
>>>> --
>>>> 1.9.1
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> best regards
> yang