Some latency-intensive workload will see obviously performance
drop when running inside VM. The main reason is that the overhead
is amplified when running inside VM. The most cost i have seen is
inside idle path.
This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.
Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):
1. w/o patch:
2493.14 ns/ctxsw -- 200.3 %CPU
2. w/ patch:
halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU
halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU
halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU
halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU
3. kvm dynamic poll
halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU
halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU
halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU
halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU
4. idle=poll
2050.1 ns/ctxsw -- 1003 %CPU
5. idle=mwait
2188.06 ns/ctxsw -- 206.3 %CPU