Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant

From: Anton Romanov
Date: Tue Apr 12 2022 - 13:39:11 EST


On Fri, Feb 25, 2022 at 4:10 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 2/25/22 02:39, Sean Christopherson wrote:
> > Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
> > host TSC is constant, in which case the actual TSC frequency will never
> > change and thus capturing the "max" TSC during initialization is
> > unnecessary, KVM can simply use tsc_khz during VM creation.
> >
> > On CPUs with constant TSC, but not a hardware-specified TSC frequency,
> > snapshotting max_tsc_khz and using that to set a VM's default TSC
> > frequency can lead to KVM thinking it needs to manually scale the guest's
> > TSC if refining the TSC completes after KVM snapshots tsc_khz. The
> > actual frequency never changes, only the kernel's calculation of what
> > that frequency is changes. On systems without hardware TSC scaling, this
> > either puts KVM into "always catchup" mode (extremely inefficient), or
> > prevents creating VMs altogether.
> >
> > Ideally, KVM would not be able to race with TSC refinement, or would have
> > a hook into tsc_refine_calibration_work() to get an alert when refinement
> > is complete. Avoiding the race altogether isn't practical as refinement
> > takes a relative eternity; it's deliberately put on a work queue outside
> > of the normal boot sequence to avoid unnecessarily delaying boot.
> >
> > Adding a hook is doable, but somewhat gross due to KVM's ability to be
> > built as a module. And if the TSC is constant, which is likely the case
> > for every VMX/SVM-capable CPU produced in the last decade, the race can
> > be hit if and only if userspace is able to create a VM before TSC
> > refinement completes; refinement is slow, but not that slow.
> >
> > For now, punt on a proper fix, as not taking a snapshot can help some
> > uses cases and not taking a snapshot is arguably correct irrespective of
> > the race with refinement.
> >
> > Cc: Suleiman Souhlal <suleiman@xxxxxxxxxx>
> > Cc: Anton Romanov <romanton@xxxxxxxxxx>
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> Queued, but I'd rather have a subject that calls out that max_tsc_khz
> needs a replacement at vCPU creation time. In fact, the real change
> (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject
> mentions only the change in kvm_timer_init().
>
> What do you think of "KVM: x86: Use current rather than max TSC
> frequency if it is constant"?
>
> Pao

Ping. This said "queued" but I don't think this ever landed.
What's the status of this?
Paolo, does this need more work?