Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

From: Wanpeng Li
Date: Tue Sep 01 2015 - 18:30:31 EST

On 9/2/15 5:45 AM, David Matlack wrote:
On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li <> wrote:
v3 -> v4:
* bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
when idle VCPU is detected

v2 -> v3:
* grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
* drop the macros and hard coding the numbers in the param definitions
* update the comments "5-7 us"
* remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
vcpu->halt_poll_ns start at zero
* drop the wrappers
* move the grow/shrink logic before "out:" w/ "if (waited)"
I posted a patchset which adds dynamic poll toggling (on/off switch). I think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's been
in my queue for a few weeks but just haven't had the time to send out. We can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment correctly,
we have to time the length of each halt. Otherwise we hit some bad edge cases:

v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew every
time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1 ms ->
2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0 when
the halts are long.

v4: v4 fixed the idle overhead problem but broke dynamic growth for message
passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would grow.
That means vcpu->halt_poll_ns will always be maxed out, even when the halt
time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
* Given Windows 10 timer tick (1 ms), let's set the maximum poll time to
less than 1ms. 200 us has been a good value for always-poll. We can
probably go a bit higher once we have your patch. Maybe 500 us?

* The base case of dynamic growth (the first grow() after being at 0) should
be small. 500 us is too big. When I run TCP_RR in my guest I see poll times
of < 10 us. TCP_RR is on the lower-end of message passing workload latency,
so 10 us would be a good base case.

How to get your TCP_RR benchmark?

Wanpeng Li

v1 -> v2:
* change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
* use the shrink/grow matrix which is suggested by David
* set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.

Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 500000ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

| | | |
| w/o halt-poll | w/ halt-poll | dynamic halt-poll |
| | | |
| ~0.9% | ~1.8% | ~1.2% |

The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead
introduced by always halt-poll.

Wanpeng Li (3):
KVM: make halt_poll_ns per-VCPU
KVM: dynamic halt_poll_ns adjustment
KVM: trace kvm_halt_poll_ns grow/shrink

include/linux/kvm_host.h | 1 +
include/trace/events/kvm.h | 30 ++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 50 +++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 78 insertions(+), 3 deletions(-)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at