Re: [PATCH] KVM: LAPIC: Fix an inversion error when a negative value assigned to lapic_timer.timer_advance_ns

From: Sean Christopherson
Date: Tue May 21 2024 - 11:02:32 EST


On Tue, May 21, 2024, zhoushuling wrote:
> > On Mon, May 20, 2024, zhoushuling@xxxxxxxxxx wrote:
> > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> > index 0a0ea4b5dd8c..6fb3b16a2754 100644
> > --- a/arch/x86/kvm/lapic.h
> > +++ b/arch/x86/kvm/lapic.h
> > @@ -54,6 +54,7 @@ struct kvm_timer {
> > u32 timer_advance_ns;
> > atomic_t pending; /* accumulated triggered timers */
> > bool hv_timer_in_use;
> > + bool timer_advance_dynamic;
> > };
>
>
> However,I do not understand why the global function switch
> 'lapic_timer_advance_dynamic' > is changed to a local variable in the 'struct
> kvm_timer'. On a host, the adaptive tuning of timer advancement is global
> function, and each vcpu->apic->lapic_timer.timer_advance_dynamic of each VM
> is the same, different VMs cannot be configured with different switches.

..

>  static int __read_mostly lapic_timer_advance_ns = -1;
>  module_param(lapic_timer_advance_ns, int, 0644);

The module param is writable, i.e. can be modified while KVM is running. Eg. if
the admin changes lapic_timer_advance_ns from a negative to a postive value, then
vCPUs that were created while lapic_timer_advance_ns<0 will have a timer_advance_ns
that was dynamically calculated, but is now static. I doubt there's a use case
that actually does anything like that, and in practice it probably doesn't cause
real problems, but it makes for bizarre and unpredictable behavior.

Hmm, alternativately, we could make lapic_timer_advance_ns a read-only boolean.
The param is wrtiable primarily because dynamic/adaptive tuning was added much
later, i.e. getting a usable value required modifying the advancement time while
VMs were running. But I would be very surprised if there are use cases that still
*need* to hand tune the advancement, as it's practically impossible for userspace
to do better than KVM.

The only argument I can think of for taking a raw value from userspace is if there
is an absurd delay that exceeds KVM's max advancement of 5us. But I'm not sure
KVM should even support such values.

Let me post a patch to convert lapic_timer_advance_ns to a read-only bool. If
there is pushback on that idea, then we can circle back to this patch, but I'm
hoping we can simplify all of this instead of hardening KVM against edge cases
that no one likely cares about.

Side topic, if we keep the module param as-is, it really should be wrapped with
READ_ONCE().