Re: [patch 0/3] KVM CPU frequency change hypercalls
From: Radim Krcmar
Date: Fri Feb 03 2017 - 14:09:44 EST
2017-02-03 16:14-0200, Marcelo Tosatti:
> On Fri, Feb 03, 2017 at 05:43:50PM +0100, Radim Krcmar wrote:
>> 2017-02-02 15:47-0200, Marcelo Tosatti:
>> > Implement KVM hypercalls for the guest
>> > to issue frequency changes.
>> >
>> > Current situation with DPDK and frequency changes is as follows:
>> > An algorithm in the guest decides when to increase/decrease
>> > frequency based on the queue length of the device.
>>
>> Does the algorithm compute with the magnitude of frequency steps?
>>
>> (e.g. if CPU can step with 200 MHz granularity, does the algorithm ever
>> do 400 MHz at once, because it assumes that frequency would be enough
>> to handle the load?)
>
> No, it does not know the frequency directly. It only "knows" the
> frequency indirectly by the size of the network queue (that is, if the
> network queue is above a threshold, then frequency is "too low" and
> should be increased).
I see, thanks. You added MAX to the interface ... so DPDK has two
thresholds and forces MAX frequency after reaching the second one?
>> > A direct hypercall from userspace is the fastest most direct
>> > method for the guest to change frequency and does not suffer
>> > from the issues above.
>>
>> Right, userspace on bare-metal cannot change frequency directly.
>
> Yes it can: write to sysfs (not sure what you meant).
On x86, the frequency can only be changed from CPL 0, but userspace runs
at CPL 3. sysfs is used because the userspace cannot change frequency
directly (behind the kernel's back).
(KVM could avoid trapping guest's access to MSRs that control frequency,
which would allow us to do it behind host's back, but still not directly
from guest userspace, because MSRs only work at CPL 0.)
>> > The usage scenario for this hypercalls is for pinned vCPUs <-> pCPUs.
>>
>> And pinned tasks <-> vCPUs, because the guest kernel has no idea what
>> frequency is being used or desired on its virtualware,
>
> And it does not have to know...
Probably not in DPDK setups, but it has to know in general.
>> so the kernel
>> cannot even change frequency without introducing a bug ...
>
> Not sure what are you thinking, please be more verbose.
One reason why we have a kernel/userspace split is to allow sharing of
CPU time. Each application then its state that the kernel keeps track
of and saves/restores while time-multiplexing.
Our frequency scaling interface goes against the idea -- guest kernel
cannot schedule multiple userspaces on the same vCPU, because they could
conflict by overriding frequency.
i.e. our feature implies userspace tasks pinned to isolated vCPUs.
>> I'm not happy about this hole through layers of isolations.
>>
>> The domain of valid users is very small and a problem is that any
>> program with access to /dev/kvm gains the ability to change host CPU
>> frequency if the host happens to use the userspace governor.
>
> Yes.
>
>> We should at least enable this feature only if /dev/kvm is root-only.
>
> Fine, can change that, will fix in -v2. Maybe there is a capability
> to change frequency... should require that capability (or root
> if there is none).
Capability sounds good too.
Thanks.