Re: [PATCH v2 1/3] KVM: x86: implement KVM_{GET|SET}_TSC_STATE

From: Thomas Gleixner
Date: Tue Dec 08 2020 - 11:41:54 EST


On Tue, Dec 08 2020 at 13:13, Maxim Levitsky wrote:
> On Mon, 2020-12-07 at 11:29 -0600, Oliver Upton wrote:
>>
>> How would a VMM maintain the phase relationship between guest TSCs
>> using these ioctls?
>
> By using the nanosecond timestamp.
>
> While I did made it optional in the V2 it was done for the sole sake of being
> able to set TSC on (re)boot to 0 from qemu, and for cases when qemu migrates
> from a VM where the feature is not enabled.
> In this case the tsc is set to the given value exactly, just like you
> can do today with KVM_SET_MSRS.
> In all other cases the nanosecond timestamp will be given.
>
> When the userspace uses the nanosecond timestamp, the phase relationship
> would not only be maintained but be exact, even if TSC reads were not
> synchronized and even if their restore on the target wasn't synchronized as well.
>
> Here is an example:
>
> Let's assume that TSC on source/target is synchronized, and that the guest TSC
> is synchronized as well.
>
> Let's call the guest TSC frequency F (guest TSC increments by F each second)
>
> We do KVM_GET_TSC_STATE on vcpu0 and receive (t0,tsc0).
> We do KVM_GET_TSC_STATE on vcpu1 after 1 second passed (exaggerated)
> and receive (t0 + 1s, tsc0 + F)

Why?

You freeeze the VM and store the realtime timestamp of doing that. At
that point assuming a full sync host system the only interesting thing
to store is the guest offset which is the same on all vCPUs and it is
known already.

So on restore the only thing which needs to be adjusted is the guest
wide offset.

newoffset = oldoffset + (now - tfreeze)

Then set newoffset for all vCPUs. Anything else is complexity for no
value and bound to fall apart in hard to debug ways.

The offset is still the same for all vCPUs whether you can restore them
in the same nanosecond or whether you need 3 minutes for each one. It
does not matter because when you restore vCPU1 3 minutes after vCPU0
then TSC has advanced 3 minutes as well. It's still correct from the
guest POV.

Even if you support TSCADJUST and let the guest write to it does not
change the per guest offset at all. TSCADJUST is per [v]CPU and adds on
top:

tscvcpu = tsc_host + guest_offset + TSC_ADJUST

Scaling is just orthogonal and does not change any of this.

Thanks,

tglx