Re: [PATCH v3 09/12] KVM: VMX: Remove vmx->current_tsc_ratio and decache_tsc_multiplier()

From: Stamatis, Ilias
Date: Tue May 25 2021 - 06:42:20 EST


On Mon, 2021-05-24 at 18:44 +0000, Sean Christopherson wrote:
> On Mon, May 24, 2021, Maxim Levitsky wrote:
> > On Fri, 2021-05-21 at 11:24 +0100, Ilias Stamatis wrote:
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 4b70431c2edd..7c52c697cfe3 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -1392,9 +1392,8 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> > > }
> > >
> > > /* Setup TSC multiplier */
> > > - if (kvm_has_tsc_control &&
> > > - vmx->current_tsc_ratio != vcpu->arch.tsc_scaling_ratio)
> > > - decache_tsc_multiplier(vmx);
> > > + if (kvm_has_tsc_control)
> > > + vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);
> >
> > This might have an overhead of writing the TSC scaling ratio even if
> > it is unchanged. I haven't measured how expensive vmread/vmwrites are but
> > at least when nested, the vmreads/vmwrites can be very expensive (if they
> > cause a vmexit).
> >
> > This is why I think the 'vmx->current_tsc_ratio' exists - to have
> > a cached value of TSC scale ratio to avoid either 'vmread'ing
> > or 'vmwrite'ing it without a need.

Right. I thought the overhead might not be that significant since we're doing
lots of vmwrites on vmentry/vmexit anyway, but yeah, why introduce any kind of
extra overhead anyway.

I'm fine with this particular patch getting dropped. It's not directly related
to the series anyway.

>
> Yes, but its existence is a complete hack. vmx->current_tsc_ratio has the same
> scope as vcpu->arch.tsc_scaling_ratio, i.e. vmx == vcpu == vcpu->arch. Unlike
> per-VMCS tracking, it should not be useful, keyword "should".
>
> What I meant by my earlier comment:
>
> Its use in vmx_vcpu_load_vmcs() is basically "write the VMCS if we forgot to
> earlier", which is all kinds of wrong.
>
> is that vmx_vcpu_load_vmcs() should never write vmcs.TSC_MULTIPLIER. The correct
> behavior is to set the field at VMCS initialization, and then immediately set it
> whenever the ratio is changed, e.g. on nested transition, from userspace, etc...
> In other words, my unclear feedback was to make it obsolete (and drop it) by
> fixing the underlying mess, not to just drop the optimization hack.

I understood this and replied earlier. The right place for the hw multiplier
field to be updated is inside set_tsc_khz() in common code when the ratio
changes. However, this requires adding another vendor callback etc. As all
this is further refactoring I believe it's better to leave this series as is -
ie only touching code that is directly related to nested TSC scaling and not
try to do everything as part of the same series. This makes testing easier
too. We can still implement these changes later.

Thanks,
Ilias