Re: [PATCH v2 1/1] KVM: x86: fix MSR_IA32_TSC read for nested migration

From: Maxim Levitsky
Date: Tue Sep 22 2020 - 11:39:48 EST


On Tue, 2020-09-22 at 17:50 +0300, Maxim Levitsky wrote:
> On Tue, 2020-09-22 at 14:50 +0200, Paolo Bonzini wrote:
> > On 21/09/20 18:23, Sean Christopherson wrote:
> > > Avoid "should" in code comments and describe what the code is doing, not what
> > > it should be doing. The only exception for this is when the code has a known
> > > flaw/gap, e.g. "KVM should do X, but because of Y, KVM actually does Z".
> > >
> > > > + * return it's real L1 value so that its restore will be correct.
> > > s/it's/its
> > >
> > > Perhaps add "unconditionally" somewhere, since arch.tsc_offset can also contain
> > > the L1 value. E.g.
> > >
> > > * Unconditionally return L1's TSC offset on userspace reads
> > > * so that userspace reads and writes always operate on L1's
> > > * offset, e.g. to ensure deterministic behavior for migration.
> > > */
> > >
> >
> > Technically the host need not restore MSR_IA32_TSC at all. This follows
> > the idea of the discussion with Oliver Upton about transmitting the
> > state of the kvmclock heuristics to userspace, which include a (TSC,
> > CLOCK_MONOTONIC) pair to transmit the offset to the destination. All
> > that needs to be an L1 value is then the TSC value in that pair.
> >
> > I'm a bit torn over this patch. On one hand it's an easy solution, on
> > the other hand it's... just wrong if KVM_GET_MSR is used for e.g.
> > debugging the guest.
>
> Could you explain why though? After my patch, the KVM_GET_MSR will consistently
> read the L1 TSC, just like all other MSRs as I explained. I guess for debugging,
> this should work?
>
> The fact that TSC reads with the guest offset is a nice exception made for the guests,
> that insist on reading this msr without inteception and not using rdtsc.
>
> Best regards,
> Maxim Levitsky
>
> > I'll talk to Maxim and see if he can work on the kvmclock migration stuff.

We talked about this on IRC and now I am also convinced that we should implement
proper TSC migration instead, so I guess I'll drop this patch and I will implement it.

Last few weeks I was digging through all the timing code, and I mostly understand it
so it shouldn't take me much time to implement it.

There is hope that this will make nested migration fully stable since, with this patch,
it still sometimes hangs. While on my AMD machine it takes about half a day of migration
cycles to reproduce this, on my Intel's laptop even with this patch I can hang the nested
guest after 10-20 cycles. The symptoms look very similar to the issue that this patch
tried to fix.

Maybe we should keep the *comment* I added to document this funny TSC read behavior.
When I implement the whole thing, maybe I add a comment only version of this patch
for that.

Best regards,
Maxim Levitsky

> >
> > Paolo
> >