RE: [PATCH] KVM: x86: use TPAUSE to replace PAUSE in halt polling

From: Mi, Dapeng1
Date: Thu Aug 25 2022 - 07:08:22 EST


> From: David Laight <David.Laight@xxxxxxxxxx>
> Sent: Wednesday, August 24, 2022 10:08 PM
> To: Mi, Dapeng1 <dapeng1.mi@xxxxxxxxx>; rafael@xxxxxxxxxx;
> daniel.lezcano@xxxxxxxxxx; pbonzini@xxxxxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> kvm@xxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH] KVM: x86: use TPAUSE to replace PAUSE in halt polling
>
> From: Dapeng Mi
> > Sent: 24 August 2022 10:11
> >
> > TPAUSE is a new instruction on Intel processors which can instruct
> > processor enters a power/performance optimized state. Halt polling
> > uses PAUSE instruction to wait vCPU is waked up. The polling time
> > could be long and cause extra power consumption in some cases.
> >
> > Use TPAUSE to replace the PAUSE instruction in halt polling to get a
> > better power saving and performance.
>
> What is the effect on wakeup latency?
> Quite often that is far more important than a bit of power saving.

In theory, the increased wakeup latency should be less than 1us. I thought this latency impaction should be minimal. I ever run two scheduling related benchmarks, hackbench and schbench. I didn't see this change would obviously impact the performance.

When running these two scheduling benchmarks on host, a FIO workload is running in a Linux VM simultaneously, FIO would trigger a large number of HLT VM-exit and then trigger haltpolling, then we can see how TPAUSE can impact the performance.

Here are the hackbench and schbench data on Intel ADL platform.

Hackbench base TPAUSE %delta
Group-1 0.056 0.052 7.14%
Group-4 0.165 0.164 0.61%
Group-8 0.313 0.309 1.28%
Group-16 0.834 0.842 -0.96%

Schbench - Latency percentiles (usec) base TPAUSE
./schbench -m 1
50.0th 15 13
99.0th 221 203
./schbench -m 2
50.0th 26 23
99.0th 16368 16544
./schbench -m 4
50.0th 56 60
99.0th 33984 34112

Since the schbench benchmark is not so stable, but I can see the data is on a same level.

> The automatic entry of sleep states is a PITA already.
> Block 30 RT threads in cv_wait() and then do cv_broadcast().
> Use ftrace to see just how long it takes the last thread to wake up.

I think this test is familiar with the hackbench and schbench, it should have similar result.

Anyway, performance and power is a tradeoff, it depends on which side we think is more important.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1
> 1PT, UK Registration No: 1397386 (Wales)