Re: [patch 0/3] KVM CPU frequency change hypercalls

From: Radim Krcmar
Date: Fri Feb 03 2017 - 11:44:02 EST


2017-02-02 15:47-0200, Marcelo Tosatti:
> Implement KVM hypercalls for the guest
> to issue frequency changes.
>
> Current situation with DPDK and frequency changes is as follows:
> An algorithm in the guest decides when to increase/decrease
> frequency based on the queue length of the device.

Does the algorithm compute with the magnitude of frequency steps?

(e.g. if CPU can step with 200 MHz granularity, does the algorithm ever
do 400 MHz at once, because it assumes that frequency would be enough
to handle the load?)

> On the host, a power manager daemon is used to listen for
> frequency change requests (on another core) and issue these
> requests.
>
> However frequency changes are performance sensitive events because:
> On a change from low load condition to max load condition,
> the frequency should be raised as soon as possible.
> Sending a virtio-serial notification to another pCPU,
> waiting for that pCPU to initiate an IPI to the requestor pCPU
> to change frequency, is slower and more cache costly than
> a direct hypercall to host to switch the frequency.
>
> If the pCPU where the power manager daemon is running
> is not busy spinning on requests from the isolated DPDK vcpus,
> there is also the cost of HLT wakeup for that pCPU.
>
> Moreover, the daemon serves multiple VMs, meaning that
> the scheme is subject to additional delays from
> queueing of power change requests from VMs.

(Wow, this must be bringing humanity to its doom faster than the heat it
helps to eliminate.)

> A direct hypercall from userspace is the fastest most direct
> method for the guest to change frequency and does not suffer
> from the issues above.

Right, userspace on bare-metal cannot change frequency directly.

> The usage scenario for this hypercalls is for pinned vCPUs <-> pCPUs.

And pinned tasks <-> vCPUs, because the guest kernel has no idea what
frequency is being used or desired on its virtualware, so the kernel
cannot even change frequency without introducing a bug ...

I'm not happy about this hole through layers of isolations.

The domain of valid users is very small and a problem is that any
program with access to /dev/kvm gains the ability to change host CPU
frequency if the host happens to use the userspace governor.

We should at least enable this feature only if /dev/kvm is root-only.