Re: [PATCH v4 4/4] KVM: Avoid synchronize_srcu() in kvm_io_bus_register_dev()

From: Keir Fraser

Date: Thu Feb 19 2026 - 02:51:05 EST


On Wed, Feb 18, 2026 at 04:15:33PM +0000, Nikita Kalyazin wrote:
>
>
> On 18/02/2026 16:02, Keir Fraser wrote:
> > On Wed, Feb 18, 2026 at 12:55:11PM +0000, Nikita Kalyazin wrote:
> > >
> > >
> > > On 17/02/2026 19:07, Sean Christopherson wrote:
> > > > On Mon, Feb 16, 2026, Nikita Kalyazin wrote:
> > > > > On 13/02/2026 23:20, Sean Christopherson wrote:
> > > > > > On Fri, Feb 13, 2026, Nikita Kalyazin wrote:
> > > > > > > I am not aware of way to make it fast for both use cases and would be more
> > > > > > > than happy to hear about possible solutions.
> > > > > >
> > > > > > What if we key off of vCPUS being created? The motivation for Keir's change was
> > > > > > to avoid stalling during VM boot, i.e. *after* initial VM creation.
> > > > >
> > > > > It doesn't work as is on x86 because the delay we're seeing occurs after the
> > > > > created_cpus gets incremented
> > > >
> > > > I don't follow, the suggestion was to key off created_vcpus in
> > > > kvm_io_bus_register_dev(), not in kvm_swap_active_memslots(). I can totally
> > > > imagine the patch not working, but the ordering in kvm_vm_ioctl_create_vcpu()
> > > > should be largely irrelevant.
> > >
> > > Yes, you're right, it's irrelevant. I had made the change in
> > > kvm_io_bus_register_dev() like proposed, but have no idea how I couldn't see
> > > the effect. I retested it now and it's obvious that it works on x86. Sorry
> > > for the confusion.
> > >
> > > >
> > > > Probably a moot point though.
> > >
> > > Yes, this will not solve the problem on ARM.
> >
> > Sorry for being late to this thread. I'm a bit confused now. Did
> > Sean's original patch (reintroducing the old logic, based on whether
> > any vcpus have been created) work for both/either/neither arch? I
> > would have expected it to work for both ARM and X86, despite the
> > offending synchronize_srcu() not being in the vcpu-creation ioctl on
> > ARM, and I think that is finally what your testing seems to show? If
> > so then that seems the pragmatic if somewhat ugly way forward.
>
> The original patch from Sean works for x86. I didn't test it on ARM as it's
> harder for me to do, but I don't expect it to work because it only affects
> the pre-vcpu-creation phase.

Ok, looking closer at one of your previous replies, the first fix
doesn't work for you on ARM because there your vcpu creations occur
earlier than on X86? Fair enough.

> We discussed the second patch at the KVM sync earlier today, then I retested
> it and it appears to solve the issue for both, but I'm going to have more
> complete results tomorrow.
>
> Are you by chance able to have a look whether KVM_SET_USER_MEMORY_REGION
> execution elongates on ARM in your environment (with the 4/4 patch)? I'd be
> curious to know why not if it doesn't.

On our VMM (crosvm) the kvm_io_bus_register_dev happen much later,
during actual VM boot (device probe phase), so the results would not
be comparable. In our scenario we generally save milliseconds on every
single kvm_io_bus_register_dev invocation.

> >
> > Cheers,
> > Keir
> >
> >
> > > >
> > > > > so it doesn't allow to differentiate the two
> > > > > cases (below is kvm_vm_ioctl_create_vcpu):
> > > > >
> > > > > kvm->created_vcpus++; // <===== incremented here
> > > > > mutex_unlock(&kvm->lock);
> > > > >
> > > > > vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL_ACCOUNT);
> > > > > if (!vcpu) {
> > > > > r = -ENOMEM;
> > > > > goto vcpu_decrement;
> > > > > }
> > > > >
> > > > > BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE);
> > > > > page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> > > > > if (!page) {
> > > > > r = -ENOMEM;
> > > > > goto vcpu_free;
> > > > > }
> > > > > vcpu->run = page_address(page);
> > > > >
> > > > > kvm_vcpu_init(vcpu, kvm, id);
> > > > >
> > > > > r = kvm_arch_vcpu_create(vcpu); // <===== the delay is here
> > > > >
> > > > >
> > > > > firecracker 583 [001] 151.297145: probe:synchronize_srcu_expedited:
> > > > > (ffffffff813e5cf0)
> > > > > ffffffff813e5cf1 synchronize_srcu_expedited+0x1 ([kernel.kallsyms])
> > > > > ffffffff81234986 kvm_swap_active_memslots+0x136 ([kernel.kallsyms])
> > > > > ffffffff81236cdd kvm_set_memslot+0x1cd ([kernel.kallsyms])
> > > > > ffffffff81237518 kvm_set_memory_region.part.0+0x478 ([kernel.kallsyms])
> > > > > ffffffff81264dbc __x86_set_memory_region+0xec ([kernel.kallsyms])
> > > > > ffffffff8127e2dc kvm_alloc_apic_access_page+0x5c ([kernel.kallsyms])
> > > > > ffffffff812b9ed3 vmx_vcpu_create+0x193 ([kernel.kallsyms])
> > > > > ffffffff8126788a kvm_arch_vcpu_create+0x1da ([kernel.kallsyms])
> > > > > ffffffff8123c54c kvm_vm_ioctl+0x5fc ([kernel.kallsyms])
> > > > > ffffffff8167b331 __x64_sys_ioctl+0x91 ([kernel.kallsyms])
> > > > > ffffffff8251a89c do_syscall_64+0x4c ([kernel.kallsyms])
> > > > > ffffffff8100012b entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> > > > > 6512de ioctl+0x32 (/mnt/host/firecracker)
> > > > > d99a7 std::rt::lang_start+0x37 (/mnt/host/firecracker)
> > > > >
> > > > > Also, given that it stumbles after the KVM_CREATE_VCPU on ARM (in
> > > > > KVM_SET_USER_MEMORY_REGION), it doesn't look like a universal solution.
> > > >
> > > > Hmm. Under the hood, __synchronize_srcu() itself uses __call_srcu, so I _think_
> > > > the only practical difference (aside from waiting, obviously) between call_srcu()
> > > > and synchronize_srcu_expedited() with respect to "transferring" grace period
> > > > latency is that using call_srcu() could start a normal, non-expedited grace period.
> > > >
> > > > IIUC, SRCU has best-effort logic to shift in-flight non-expedited grace periods
> > > > to expedited mode, but if the normal grace period has already started the timer
> > > > for the delayed invocation of process_srcu(), then SRCU will still wait for one
> > > > jiffie, i.e. won't immediately queue the work.
> > > >
> > > > I have no idea if this is sane and/or acceptable, but before looping in Paul and
> > > > others, can you try this to see if it helps?
> > >
> > > That's exactly what I tried myself before and it didn't help, probably for
> > > the reason you mentioned above (a normal GP being already started).
> > >
> > > >
> > > > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > > > index 344ad51c8f6c..30437dc8d818 100644
> > > > --- a/include/linux/srcu.h
> > > > +++ b/include/linux/srcu.h
> > > > @@ -89,6 +89,8 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
> > > >
> > > > void call_srcu(struct srcu_struct *ssp, struct rcu_head *head,
> > > > void (*func)(struct rcu_head *head));
> > > > +void call_srcu_expedited(struct srcu_struct *ssp, struct rcu_head *rhp,
> > > > + rcu_callback_t func);
> > > > void cleanup_srcu_struct(struct srcu_struct *ssp);
> > > > void synchronize_srcu(struct srcu_struct *ssp);
> > > >
> > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > index ea3f128de06f..03333b079092 100644
> > > > --- a/kernel/rcu/srcutree.c
> > > > +++ b/kernel/rcu/srcutree.c
> > > > @@ -1493,6 +1493,13 @@ void call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
> > > > }
> > > > EXPORT_SYMBOL_GPL(call_srcu);
> > > >
> > > > +void call_srcu_expedited(struct srcu_struct *ssp, struct rcu_head *rhp,
> > > > + rcu_callback_t func)
> > > > +{
> > > > + __call_srcu(ssp, rhp, func, rcu_gp_is_normal());
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(call_srcu_expedited);
> > > > +
> > > > /*
> > > > * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
> > > > */
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > > index 737b74b15bb5..26215f98c98f 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -6036,7 +6036,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> > > > memcpy(new_bus->range + i + 1, bus->range + i,
> > > > (bus->dev_count - i) * sizeof(struct kvm_io_range));
> > > > rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> > > > - call_srcu(&kvm->srcu, &bus->rcu, __free_bus);
> > > > + call_srcu_expedited(&kvm->srcu, &bus->rcu, __free_bus);
> > > >
> > > > return 0;
> > > > }
> > >
>