Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time

From: Lei Chen

Date: Wed Apr 01 2026 - 02:44:11 EST


Hi Jaroslav,

I apologize for the late reply.

I have reviewed the code and identified two scenarios that currently
trigger the KVM_REQ_GLOBAL_CLOCK_UPDATE request:

Scenario 1: kvm_write_system_time
This code path occurs when the hypervisor (such as QEMU) adjusts the
time, or when the guest writes to the TSC.

Scenario 2: vcpu schedule in kvm_arch_vcpu_load
If this function triggers KVM_REQ_GLOBAL_CLOCK_UPDATE, it indicates
that the virtual machine is not using the master_clock.

Those two cases are uncommon. Could you please provide your dmesg and
help check which code path triggers KVM_REQ_GLOBAL_CLOCK_UPDATE?


Best regards,
Lei Chen

On Mon, Mar 23, 2026 at 10:27 AM Lei Chen <lei.chen@xxxxxxxxxx> wrote:
>
> Hi Jaroslav,
>
> Thanks for your test and report, I'm looking into this problem.
>
> Best regards
> Lei Chen
>
> On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
> <jaroslav.pulchart@xxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > I am reporting a performance regression in Linux 6.19 that severely
> > impacts KVM hosts running many Firecracker microVMs.
> >
> > == Bisect result ==
> >
> > 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> > commit 446fcce2a52b533c543dabba26777813c347577c
> > Author: Lei Chen <lei.chen@xxxxxxxxxx>
> > Date: Tue Aug 19 23:20:26 2025 +0800
> >
> > Revert "x86: kvm: rate-limit global clock updates"
> >
> > This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
> >
> > Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
> > intends to use a kvmclock_update_work to sync ntp corretion
> > across all vcpus kvmclock, which is based on commit 0061d53daf26f
> > ("KVM: x86: limit difference between kvmclock updates")
> >
> > Since kvmclock has been switched to mono raw, this commit can be
> > reverted.
> >
> > Signed-off-by: Lei Chen <lei.chen@xxxxxxxxxx>
> > Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@xxxxxxxxxx
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> >
> > arch/x86/include/asm/kvm_host.h | 1 -
> > arch/x86/kvm/x86.c | 29 ++++-------------------------
> > 2 files changed, 4 insertions(+), 26 deletions(-)
> >
> > ==== Symptoms ====
> >
> > Measured on a KVM micro VM host running many Firecracker microVMs
> > (node_exporter metrics, 2026-03-20):
> >
> > kernel 6.19:
> > steal time inside guest VMs: 3–24% per vCPU (sustained)
> > host system CPU (kernel mode): 3–12 CPUs saturated
> > host steal: 3–8%
> >
> > kernel 6.18 (same host, same workload after rollback):
> > steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
> > host system CPU (kernel mode): 2–3 CPUs
> > host steal: 0.3–0.5%
> >
> > ==== Root cause (by AI analyze) ====
> >
> > The regressing commit removes the rate-limiting from
> > kvm_gen_kvmclock_update(). Previously this function deferred the
> > all-vCPU kick via a 100ms delayed_work:
> >
> > /* 6.18 */
> > static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> > kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> > schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > KVMCLOCK_UPDATE_DELAY); /* 100ms */
> > }
> >
> > After the revert it kicks every vCPU of the VM synchronously on
> > every call:
> >
> > /* 6.19 */
> > static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > kvm_vcpu_kick(vcpu);
> > }
> > }
> >
> > KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> > is issued on every vCPU load when use_master_clock is false
> > (arch/x86/kvm/x86.c, kvm_vcpu_load):
> >
> > if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> > kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> >
> > With many Firecracker microVMs, the vCPU scheduling rate is high.
> > Each scheduling event now IPIs every sibling vCPU of the VM, instead
> > of coalescing all-vCPU kicks into at most one per 100ms. This creates
> > a continuous IPI storm on the host, visible as high kernel (system)
> > CPU time and high steal time inside guest VMs.
> >
> > The commit justifies the removal with "Since kvmclock has been switched
> > to mono raw, this commit can be reverted." That reasoning is correct
> > for the NTP-correction use case, but the 100ms rate-limit also
> > protected against IPI storms when use_master_clock is false — a
> > concern independent of clock source.
> >
> > ==== Full bisect log ====
> >
> > git bisect start
> > # status: waiting for both good and bad commits
> > # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> > git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> > # status: waiting for bad commit, 1 good commit known
> > # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> > git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> > # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> > 'hwmon-for-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> > git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> > # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> > 'tty-6.19-rc1' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> > # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> > 'soc-arm-6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> > # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> > 'tracepoints-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> > git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> > # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> > 'pull-persistency' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> > # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> > 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> > git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> > # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> > 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> > git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> > # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> > unused declaration kvm_mmu_may_ignore_guest_pat()
> > git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> > # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> > feature to track the MMIO Stale Data mitigation
> > git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> > # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> > Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> > git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> > # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> > comment about ntp correction sync for
> > git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> > # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> > lapic_timer in a local variable to cleanup periodic code
> > git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> > # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> > rate-limit global clock updates"
> > git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> > # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> > introduce periodic global clock updates"
> > git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> > # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> > "x86: kvm: rate-limit global clock updates"
> >
> > Best regards,
> > Jaroslav Pulchart