Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
From: Jaroslav Pulchart
Date: Thu May 07 2026 - 05:35:29 EST
>
> On Wed, May 06, 2026, Jaroslav Pulchart wrote:
> > > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > > > On 5/6/26 14:55, Sean Christopherson wrote:
> > > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > > > >> On 4/9/26 21:21, Sean Christopherson wrote:
> > > > >>> On Thu, Apr 09, 2026, Lei Chen wrote:
> > > > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > > > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> > > > >>>>
> > > > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> > > > >>>> every time a vCPU is scheduled when the master clock is disabled or when
> > > > >>>> the vCPU is loaded for the first time.
> > > > >>>>
> > > > >>>> Restore the throttling with a per-VM ratelimit state and gate
> > > > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> > > > >>>> scheduling does not generate a steady stream of redundant clock update
> > > > >>>> requests.
> > > > >>>>
> > > > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > > > >>>> Signed-off-by: Lei Chen <lei.chen@xxxxxxxxxx>
> > > > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@xxxxxxxxxxxx>
> > > > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@xxxxxxxxxxxxxx/
> > > > >>
> > > > >> Was this performance regression ever addressed?
> > > > > Nope, not yet.
> > > > >
> > > > >> Looks like this fall through the cracks, but it's easy to miss something.
> > > > >
> > > > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze
> > > > > it into the initial 7.1 pull request for a variety of reasons.
> > > >
> > > > Hmmm. CCing Linus so he can speak up if he wants to about the following:
> > > >
> > > > Given that this is a fix for a performance regression[1] I'd say it's
> > > > not as urgent as a "something stopped working" case -- so I guess it's
> > > > something where the "[fix] "within a week", preferably before the next
> > > > rc" approach Linus recently mentioned does not need to be applied strictly.
> > > >
> > > > But Jaroslav OTOH reported it more than 7 weeks ago already and back
> > > > then called it something that "severely impacts KVM hosts running many
> > > > Firecracker microVMs."[1];
> > >
> > > For a setup that is likely broken. On modern hardware, the path in question
> > > should never actually be hit. I do want to resolve the bug since older hardware
> > > and funky setups do rely on the old behavior, but it's not pants-on-fire urgent.
> > >
> > > More importantly, the original reporter(s) hasn't responded to any of our questions,
> > > or to the proposed fix. I'm not going to rush in a fix if I don't actually *know*
> > > it's going to fix the original problem.
> >
> > Hi Sean, Thorsten,
> >
> > sorry for the missing response from my side, this thread unfortunately
> > ended up in trash due to mail filters on my side and I completely
> > missed it.
>
> No worries, gmail's Spam filter is my nemesis :-)
>
> > I currently don't have the full context loaded back in yet, but I'll re-read
> > the thread and follow up properly once I do.
>
> I think the only remaining question is why/how KVM's master clock is getting
> disabled. But that's more of a question for your deployment than it is a question
> for upstream; it's possible there's a different KVM bug lurking, but it's more
> likely that something in your setup is incompatible with using the master clock.
>
> Note, it's certainly not "wrong" for the master clock to be disabled, but it's
> quite suprising, especially for Firecracker VMs. It's worth investigating as
> there might be an underlying issue that's very easy to address, and "fixing" it
> should provide (very) small performance benefits.
I've dug into the "master clock question" and have an idea.
Our Firecracker hosts are themselves L1 KVM VMs (nested
virtualisation) running on AMD EPYC 9454P and EPYC 9455 hardware. Even
though the compute nodes use cpu_mode=host-passthrough in qemu kvm,
the invtsc CPUID bit is filtered out by QEMU, which I hadn't realized.
Without it the guest kernel marks the TSC unstable at boot:
tsc: Marking TSC unstable due to TSCs unsynchronized
and falls back to kvm-clock as its clocksource.
I suppose that in turn prevents KVM from enabling the master clock for
any L2 guests (the Firecracker microVMs), am I right?
I have resolved the issue by explicitly adding +invtsc to
cpu_model_extra_flags in our OpenStack nova.conf. After this change
the L1 VMs now correctly show constant_tsc and nonstop_tsc in
/proc/cpuinfo and switch clocksource to tsc. I also confirmed the IPI
storm disappears without the v2 patch when +invtsc is present, and
returns when it is absent on a vanilla 7.0.3 kernel.
So could this be the answer to your question: "the master clock was
disabled because QEMU silently drops invtsc even in host-passthrough
mode"?