Re: [PATCH v4 00/10] make L2's kvm-clock stable, get rid of pvclock_gtod_copy in KVM

From: Paolo Bonzini
Date: Wed Aug 02 2017 - 13:11:53 EST

On 02/08/2017 18:49, John Stultz wrote:
> On Wed, Aug 2, 2017 at 7:38 AM, Denis Plotnikov
> <dplotnikov@xxxxxxxxxxxxx> wrote:
>> V4:
>> * removed "is stable" function with vague definition of stability
>> there is the only function which does time with cycle stamp getting
>> * some variables renamed
>> * some patches split into smaller once
>> * atomic64_t usage is replaced with atomic_t
>> V3:
>> Changing the timekeeper interface for clocksource reading looks like
>> an overkill to achive the goal of getting cycles stamp for KVM.
>> Instead extend the timekeeping interface and add functions which provide
>> necessary data: read clocksource with cycles stamp, check whether the
>> clock source is stable.
>> Use those functions and improve existing timekeeper functionality to
>> replace pvclock_gtod_copy scheme in masterclock data calculation.
>> V2:
>> The main goal is to make L2 kvm-clock be stable when it's running over L1
>> with stable kvm-clock.
>> The patch series is for x86 architecture only. If the series is approved
>> I'll do changes for other architectures but I don't have an ability to
>> compile and check for every single on (help needed)
>> The patch series do the following:
>> * change timekeeper interface to get cycles stamp value from
>> the timekeeper
>> * get rid of pvclock copy in KVM by using the changed timekeeper
>> interface: get time and cycles right from the timekeeper
>> * make KVM recognize a stable kvm-clock as stable clocksource
>> and use the KVM masterclock in this case, which means making
>> L2 stable when running over stable L1 kvm-clock
> So, from a brief skim, I'm not a big fan of this patchset. Though this
> is likely in part due to that I haven't seen anything about *why*
> these changes are needed.

>From my selfish KVM maintainer point of view, one advantage is that it
drops knowledge of internal timekeeping functioning from KVM, using
ktime_get_snapshot instead. These are patches 1-5. Structuring the
series like this was my idea so I take the blame.

As to patches 6-10, KVM is currently only able to provide vsyscalls if
the host is using the TSC. However, when using nested virtualization
you have

L0: bare-metal hypervisor (uses TSC)
L1: nested hypervisor (uses kvmclock, can use vsyscall)
L2: nested guest

and L2 cannot use vsyscall because it is not using the TSC. This series
lets you use the vsyscall in L2 as long as L1 can.

There is one point where I couldn't help Denis as much as I wanted.
That's a definition of what's a "good" clocksource that can be used by
KVM to provide the vsyscall. I know why the patch is correct, but I
couldn't really define the concept.

In ktime_get_snapshot and struct system_counterval_t's users, they seem
to use "cycles" to map from TSC to ART; this is not unlike kvmclock's
use of "cycles" to map from TSC to nanoseconds at an origin point.
However, it's not clear to me whether "cycles" may be used by
adjust_historical_crosststamp even for non-TSC clocksources (or
non-kvmclock after this series). It doesn't help that
adjust_historical_crosststamp is essentially dead code, since
get_device_system_crosststamp is always called with a NULL history argument.

I'm also CCing Marcelo who wrote the KVM vsyscall code.


> Can you briefly explain the issue you're trying to solve, and why you
> think this approach is the way to go?
> (Its usually a good idea to have such rational included in the patchset)