Re: cputime takes cstate into consideration

From: Konrad Rzeszutek Wilk
Date: Wed Jun 26 2019 - 10:53:59 EST

On Wed, Jun 26, 2019 at 12:33:30PM +0200, Thomas Gleixner wrote:
> On Wed, 26 Jun 2019, Wanpeng Li wrote:
> > After exposing mwait/monitor into kvm guest, the guest can make
> > physical cpu enter deeper cstate through mwait instruction, however,
> > the top command on host still observe 100% cpu utilization since qemu
> > process is running even though guest who has the power management
> > capability executes mwait. Actually we can observe the physical cpu
> > has already enter deeper cstate by powertop on host. Could we take
> > cstate into consideration when accounting cputime etc?
> If MWAIT can be used inside the guest then the host cannot distinguish
> between execution and stuck in mwait.
> It'd need to poll the power monitoring MSRs on every occasion where the
> accounting happens.
> This completely falls apart when you have zero exit guest. (think
> NOHZ_FULL). Then you'd have to bring the guest out with an IPI to access
> the per CPU MSRs.
> I assume a lot of people will be happy about all that :)

There were some ideas that Ankur (CC-ed) mentioned to me of using the perf
counters (in the host) to sample the guest and construct a better
accounting idea of what the guest does. That way the dashboard
from the host would not show 100% CPU utilization.

But the patches that Marcelo posted (" cpuidle-haltpoll driver") in
"solves" the problem for Linux. That is the guest wants awesome latency and
one way was to expose MWAIT to the guest, or just tweak the guest to do the
idling a bit different.

Marcelo patches are all good for Linux, but Windows is still an issue.

Ankur, would you be OK sharing some of your ideas?
> Thanks,
> tglx