Re: [PATCH 2/2] x86, kvm: use kvmclock to compute TSC deadline value

From: Paolo Bonzini
Date: Fri Sep 16 2016 - 11:06:47 EST

On 16/09/2016 16:59, Radim KrÄmÃÅ wrote:
> KVM_MSR_DEADLINE would be interface in kvmclock nanosecond values and
> MSR_IA32_TSCDEADLINE in TSC values. KVM_MSR_DEADLINE would follow
> similar rules as MSR_IA32_TSCDEADLINE -- the interrupt fires when
> kvmclock reaches the value, you read what you write, and 0 disarms it.
> If the TSC deadline timer was enabled, then the guest could write to
> both MSR_IA32_TSCDEADLINE and KVM_MSR_DEADLINE, but only one could be
> armed at any time (non-zero write to one will set the other to 0).
> The dual interface would allow unconditinal addition of the PV feature
> without regressing users that currently use MSR_IA32_TSCDEADLINE and
> adapted their stack to handle KVM's TSC shortcomings ...

So far so good. My question is: what happens if you write to
KVM_MSR_DEADLINE and read from MSR_IA32_TSCDEADLINE, or vice versa?

The possibilities are:

a) you read a 0

b) you read the value converted to the other unit

c) you read another value such as -1

(a) and (c) are the simplest of course. (c) may make sense when writing
to MSR_IA32_TSCDEADLINE and reading from KVM_MSR_DEADLINE, since we can
decide which values are valid or not; -1 is technically a valid TSC

I'm not sure about whether to allow (b). In the end KVM is going to
convert a nsec deadline to a TSC value internally, and vice versa. On
the other hand, if we do, userspace needs to figure out (on migration)
whether the guest set up a TSC or a nanosecond deadline.

>> this lets userspace decide whether to set a nsec-based
>> deadline or a TSC-based deadline after migration.
> Hm, isn't switching to TSC-based deadline after migration pointless?

Yes, but I didn't mean that. I meant preserving which MSR was written
to arm the timer, and redoing the same on the destination.

>>>> This still wouldn't handle old hosts of course.
>>> The question is whether we want to carry around 150 LOC because of old
>>> hosts. I'd just fix Linux to avoid deadline TSC without invariant TSC.
>>> :)
>> Yes, that would automatically blacklist it on KVM. You'd also need to
>> update the recent optimization to the TSC deadline timer, to also work
>> on other APIC timer modes or at least in your new PV mode.
> All modes shouldn't be much harder than just the PV mode.

The PV mode would still be a bit easier since it's still the TSC
deadline timer just with a nicer interface that is not based on the TSC.
Depends on how you code it though, I guess.