Re: [PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()

From: Sean Christopherson

Date: Fri Feb 13 2026 - 18:20:59 EST


On Fri, Feb 13, 2026, Nikita Kalyazin wrote:
>
>
> On 09/09/2025 11:00, Keir Fraser wrote:
> > Device MMIO registration may happen quite frequently during VM boot,
> > and the SRCU synchronization each time has a measurable effect
> > on VM startup time. In our experiments it can account for around 25%
> > of a VM's startup time.
> >
> > Replace the synchronization with a deferred free of the old kvm_io_bus
> > structure.
>
>
> Hi,
>
> We noticed that this change introduced a regression of ~20 ms to the first
> KVM_CREATE_VCPU call of a VM, which is significant for our use case.
>
> Before the patch:
> 45726 14:45:32.914330 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.000137>
> 45726 14:45:32.914533 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000046>
>
> After the patch:
> 30295 14:47:08.057412 ioctl(25, KVM_CREATE_VCPU, 0) = 28 <0.025182>
> 30295 14:47:08.082663 ioctl(25, KVM_CREATE_VCPU, 1) = 30 <0.000031>
>
> The reason, as I understand, it happens is call_srcu() called from
> kvm_io_bus_register_dev() are adding callbacks to be called after a normal
> GP, which is 10 ms with HZ=100. The subsequent synchronize_srcu_expedited()
> called from kvm_swap_active_memslots() (from KVM_CREATE_VCPU) has to wait
> for the normal GP to complete before making progress. I don't fully
> understand why the delay is consistently greater than 1 GP, but that's what
> we see across our testing scenarios.
>
> I verified that the problem is relaxed if the GP is reduced by configuring
> HZ=1000. In that case, the regression is in the order of 1 ms.
>
> It looks like in our case we don't benefit much from the intended
> optimisation as the number of device MMIO registrations is limited and and
> they don't cost us much (each takes at most 16 us, but most commonly ~6 us):

Maybe differences in platforms for arm64 vs x86?

> I am not aware of way to make it fast for both use cases and would be more
> than happy to hear about possible solutions.

What if we key off of vCPUS being created? The motivation for Keir's change was
to avoid stalling during VM boot, i.e. *after* initial VM creation.

--
From: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Fri, 13 Feb 2026 15:15:01 -0800
Subject: [PATCH] KVM: Synchronize SRCU on I/O device registration if vCPUs
haven't been created

TODO: Write a changelog if this works.

Fixes: 7d9a0273c459 ("KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()")
Reported-by: Nikita Kalyazin <kalyazin@xxxxxxxxxx>
Closes: https://lkml.kernel.org/r/a84ddba8-12da-489a-9dd1-ccdf7451a1ba%40amazon.com
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
virt/kvm/kvm_main.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 571cf0d6ec01..043b1c3574ab 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6027,7 +6027,30 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
memcpy(new_bus->range + i + 1, bus->range + i,
(bus->dev_count - i) * sizeof(struct kvm_io_range));
rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
- call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
+
+ /*
+ * To optimize VM creation *and* boot time, use different tactics for
+ * safely freeing the old bus based on where the VM is at in its
+ * lifecycle. If vCPUs haven't yet been created, simply synchronize
+ * and free, as there are unlikely to be active SRCU readers; if not,
+ * defer freeing the bus via SRCU callback.
+ *
+ * If there are active SRCU readers, synchronizing will stall until the
+ * current grace period completes, which can meaningfully impact boot
+ * time for VMs that trigger a large number of registrations.
+ *
+ * If there aren't SRCU readers, using an SRCU callback can be a net
+ * negative due to starting a grace period of its own, which in turn
+ * can unnecessarily cause a future synchronization to stall. E.g. if
+ * devices are registered before memslots are created, then creating
+ * the first memslot will have to wait for a superfluous grace period.
+ */
+ if (!READ_ONCE(kvm->created_vcpus)) {
+ synchronize_srcu_expedited(&kvm->srcu);
+ kfree(bus);
+ } else {
+ call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
+ }

return 0;
}

base-commit: 183bb0ce8c77b0fd1fb25874112bc8751a461e49
--