Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time

From: Lei Chen

Date: Sun Mar 22 2026 - 22:28:08 EST


Hi Jaroslav,

Thanks for your test and report, I'm looking into this problem.

Best regards
Lei Chen

On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
<jaroslav.pulchart@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> I am reporting a performance regression in Linux 6.19 that severely
> impacts KVM hosts running many Firecracker microVMs.
>
> == Bisect result ==
>
> 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> commit 446fcce2a52b533c543dabba26777813c347577c
> Author: Lei Chen <lei.chen@xxxxxxxxxx>
> Date: Tue Aug 19 23:20:26 2025 +0800
>
> Revert "x86: kvm: rate-limit global clock updates"
>
> This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
>
> Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
> intends to use a kvmclock_update_work to sync ntp corretion
> across all vcpus kvmclock, which is based on commit 0061d53daf26f
> ("KVM: x86: limit difference between kvmclock updates")
>
> Since kvmclock has been switched to mono raw, this commit can be
> reverted.
>
> Signed-off-by: Lei Chen <lei.chen@xxxxxxxxxx>
> Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@xxxxxxxxxx
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
>
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/x86.c | 29 ++++-------------------------
> 2 files changed, 4 insertions(+), 26 deletions(-)
>
> ==== Symptoms ====
>
> Measured on a KVM micro VM host running many Firecracker microVMs
> (node_exporter metrics, 2026-03-20):
>
> kernel 6.19:
> steal time inside guest VMs: 3–24% per vCPU (sustained)
> host system CPU (kernel mode): 3–12 CPUs saturated
> host steal: 3–8%
>
> kernel 6.18 (same host, same workload after rollback):
> steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
> host system CPU (kernel mode): 2–3 CPUs
> host steal: 0.3–0.5%
>
> ==== Root cause (by AI analyze) ====
>
> The regressing commit removes the rate-limiting from
> kvm_gen_kvmclock_update(). Previously this function deferred the
> all-vCPU kick via a 100ms delayed_work:
>
> /* 6.18 */
> static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> KVMCLOCK_UPDATE_DELAY); /* 100ms */
> }
>
> After the revert it kicks every vCPU of the VM synchronously on
> every call:
>
> /* 6.19 */
> static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> kvm_vcpu_kick(vcpu);
> }
> }
>
> KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> is issued on every vCPU load when use_master_clock is false
> (arch/x86/kvm/x86.c, kvm_vcpu_load):
>
> if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>
> With many Firecracker microVMs, the vCPU scheduling rate is high.
> Each scheduling event now IPIs every sibling vCPU of the VM, instead
> of coalescing all-vCPU kicks into at most one per 100ms. This creates
> a continuous IPI storm on the host, visible as high kernel (system)
> CPU time and high steal time inside guest VMs.
>
> The commit justifies the removal with "Since kvmclock has been switched
> to mono raw, this commit can be reverted." That reasoning is correct
> for the NTP-correction use case, but the 100ms rate-limit also
> protected against IPI storms when use_master_clock is false — a
> concern independent of clock source.
>
> ==== Full bisect log ====
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> # status: waiting for bad commit, 1 good commit known
> # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> 'hwmon-for-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> 'tty-6.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> 'soc-arm-6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> 'tracepoints-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> 'pull-persistency' of
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> unused declaration kvm_mmu_may_ignore_guest_pat()
> git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> feature to track the MMIO Stale Data mitigation
> git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> comment about ntp correction sync for
> git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> lapic_timer in a local variable to cleanup periodic code
> git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> rate-limit global clock updates"
> git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> introduce periodic global clock updates"
> git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> "x86: kvm: rate-limit global clock updates"
>
> Best regards,
> Jaroslav Pulchart