Re: [PATCH v1 3/3] Introducing kvm_arch_set_irq_inatomic fast inject
From: Matthew Rosato
Date: Wed Mar 11 2026 - 16:26:35 EST
On 3/7/26 10:04 PM, Douglas Freimuth wrote:
> Patch 3: This patch introduces kvm_arch_set_irq_inatomic, a fast path
See comments from patch 1 of this series.
> for irq injection. Instead of placing all interrupts on the global work
> queue as it does today, this patch provides a fast path for irq
Maybe add a bit that this depends on creating a path that cannot lose control, namely by pre-pinning the associated indicator pages, conditionally attempting allocations via GFP_ATOMIC and switching a mutex to a spinlock.
This would also be a good point to work in the fact that this is fenced for Secure Execution guests rather than patch 1 beacuse the indicator pages will not be pre-pinned.
> injection. Statistical counters have been added to enable analysis of irq injection on
> the fast path and slow path including io_390_inatomic,
> io_flic_inject_airq, io_set_adapter_int and io_390_inatomic_adapter_masked.
> > In order to use this kernel series with virtio-pci, a QEMU that includes
> the 's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed.
> Additionally, the guest xml needs a thread pool and threads explicitly
> assigned per disk device using the common way of defining threads for
> disks.
This last paragraph, while relevant to testing you were doing, delves a bit too much into the specifics of when QEMU will/won't drive this code and doesn't need to be in kernel git history. I'd suggest either removing it or moving it to the cover-letter.
>
>
> Signed-off-by: Douglas Freimuth <freimuth@xxxxxxxxxxxxx>
> ---
> arch/s390/include/asm/kvm_host.h | 8 +-
> arch/s390/kvm/interrupt.c | 169 ++++++++++++++++++++++++++-----
> arch/s390/kvm/kvm-s390.c | 24 ++++-
> arch/s390/kvm/kvm-s390.h | 3 +-
> 4 files changed, 170 insertions(+), 34 deletions(-)
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 616be8ca4614..a194e9808ae3 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -359,7 +359,7 @@ struct kvm_s390_float_interrupt {
> struct kvm_s390_mchk_info mchk;
> struct kvm_s390_ext_info srv_signal;
> int last_sleep_cpu;
> - struct mutex ais_lock;
> + spinlock_t ais_lock;
> u8 simm;
> u8 nimm;
> };
> @@ -450,6 +450,10 @@ struct kvm_vm_stat {
> u64 inject_io;
> u64 io_390_adapter_map;
> u64 io_390_adapter_unmap;
> + u64 io_390_inatomic;
> + u64 io_flic_inject_airq;
> + u64 io_set_adapter_int;
> + u64 io_390_inatomic_adapter_masked;
> u64 inject_float_mchk;
> u64 inject_pfault_done;
> u64 inject_service_signal;
> @@ -481,7 +485,7 @@ struct s390_io_adapter {
> bool masked;
> bool swap;
> bool suppressible;
> - struct rw_semaphore maps_lock;
> + spinlock_t maps_lock;
Patch 1 (re-)introduced the maps_lock, now you are converting it to a spinlock 2 patches later.
I realize that you were bringing back code that was previously removed by
f65470661f36 KVM: s390/interrupt: do not pin adapter interrupt pages
but for this series wouldn't it make sense to just start by introducing maps_lock as a spinlock from patch 1 vs re-introducing the semaphore for the span of 2 commits?
> @@ -2700,6 +2706,8 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
> unsigned int id = attr->attr;
> struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
>
> + kvm->stat.io_flic_inject_airq++;
> +
This just tracks how often the function is called, and includes instances where the adapter is NULL or the call to kvm_s390_inject_airq failed.
Do you want to actually track the number of successful injects only vs every time we call this routine?
> @@ -2919,15 +2968,15 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
> int ret;
> struct s390_io_adapter *adapter;
>
> + kvm->stat.io_set_adapter_int++;
> +
Same comment, would we rather track only the successful cases or only count times we enter the function including things like the 0->1 transition below?
Actually, the addition of this counter as well as io_flic_inject_airq seems like it should be a separate patch.
> /* We're only interested in the 0->1 transition. */
> if (!level)
> return 0;
...
> +int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
> + struct kvm *kvm, int irq_source_id, int level,
> + bool line_status)
> +{
> + int ret;
> + struct s390_io_adapter *adapter;
> + struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
> + struct kvm_s390_interrupt_info *inti;
> + struct kvm_s390_interrupt s390int = {
> + .type = KVM_S390_INT_IO(1, 0, 0, 0),
> + .parm = 0,
> + };
> +
> + kvm->stat.io_390_inatomic++;
Is this the right time to increment this value? There are plenty of reasons we could -EWOULDBLOCK after this point and fall back to scheduling it.
So this will only count how often we call here from irqfd_wakeup() and not how often we actually successfully make it thru the inatomic operation.
> +
> + /* We're only interested in the 0->1 transition. */
> + if (!level)
> + return -EWOULDBLOCK;
> + if (e->type != KVM_IRQ_ROUTING_S390_ADAPTER)
> + return -EWOULDBLOCK;
> +
> + adapter = get_io_adapter(kvm, e->adapter.adapter_id);
> + if (!adapter)
> + return -EWOULDBLOCK;
> +
> + s390int.parm64 = isc_to_int_word(adapter->isc);
> + ret = adapter_indicators_set_fast(kvm, adapter, &e->adapter);
> + if (ret < 0)
> + return -EWOULDBLOCK;
We know for sure that a guest that is running in SE will always hit this because no mappings will be available.
Did you test if it would be more efficient to check the kvm for SE at the start of kvm_arch_set_irq_inatomic() and immediately return -EWOULDBLOCK there?
> + if (!ret || adapter->masked) {
> + kvm->stat.io_390_inatomic_adapter_masked++;
> + return 0;
> + }
> +
> + inti = kzalloc_obj(*inti, GFP_ATOMIC);
> + if (!inti)
> + return -EWOULDBLOCK;
> +
> + if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
> + ret = kvm_s390_inject_vm(kvm, &s390int, inti);
> + if (ret == 0)
This to me seems like the right spot to kvm->stat.io_390_inatomic++ unless I misunderstand the intent of the counter.