Re: [PATCH v1 3/3] Introducing kvm_arch_set_irq_inatomic fast inject

From: Douglas Freimuth

Date: Fri Mar 13 2026 - 10:02:08 EST

On 3/11/26 4:24 PM, Matthew Rosato wrote:

On 3/7/26 10:04 PM, Douglas Freimuth wrote:

Patch 3: This patch introduces kvm_arch_set_irq_inatomic, a fast path

See comments from patch 1 of this series.

for irq injection. Instead of placing all interrupts on the global work
queue as it does today, this patch provides a fast path for irq

Maybe add a bit that this depends on creating a path that cannot lose control, namely by pre-pinning the associated indicator pages, conditionally attempting allocations via GFP_ATOMIC and switching a mutex to a spinlock.

This would also be a good point to work in the fact that this is fenced for Secure Execution guests rather than patch 1 beacuse the indicator pages will not be pre-pinned.

injection. Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic,
io_flic_inject_airq, io_set_adapter_int and io_390_inatomic_adapter_masked.

In order to use this kernel series with virtio-pci, a QEMU that includes

the 's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed.
Additionally, the guest xml needs a thread pool and threads explicitly
assigned per disk device using the common way of defining threads for
disks.

This last paragraph, while relevant to testing you were doing, delves a bit too much into the specifics of when QEMU will/won't drive this code and doesn't need to be in kernel git history. I'd suggest either removing it or moving it to the cover-letter.

Signed-off-by: Douglas Freimuth <freimuth@xxxxxxxxxxxxx>
---
arch/s390/include/asm/kvm_host.h | 8 +-
arch/s390/kvm/interrupt.c | 169 ++++++++++++++++++++++++++-----
arch/s390/kvm/kvm-s390.c | 24 ++++-
arch/s390/kvm/kvm-s390.h | 3 +-
4 files changed, 170 insertions(+), 34 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 616be8ca4614..a194e9808ae3 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -359,7 +359,7 @@ struct kvm_s390_float_interrupt {
struct kvm_s390_mchk_info mchk;
struct kvm_s390_ext_info srv_signal;
int last_sleep_cpu;
- struct mutex ais_lock;
+ spinlock_t ais_lock;
u8 simm;
u8 nimm;
};
@@ -450,6 +450,10 @@ struct kvm_vm_stat {
u64 inject_io;
u64 io_390_adapter_map;
u64 io_390_adapter_unmap;
+ u64 io_390_inatomic;
+ u64 io_flic_inject_airq;
+ u64 io_set_adapter_int;
+ u64 io_390_inatomic_adapter_masked;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
@@ -481,7 +485,7 @@ struct s390_io_adapter {
bool masked;
bool swap;
bool suppressible;
- struct rw_semaphore maps_lock;
+ spinlock_t maps_lock;

Patch 1 (re-)introduced the maps_lock, now you are converting it to a spinlock 2 patches later.

I realize that you were bringing back code that was previously removed by

f65470661f36 KVM: s390/interrupt: do not pin adapter interrupt pages

but for this series wouldn't it make sense to just start by introducing maps_lock as a spinlock from patch 1 vs re-introducing the semaphore for the span of 2 commits?

Matt, thank you for your input. The policy of individual patches not only compiling but having individual utility and standing on their own applies here. In patches 1 and 2 the maps_lock is more flexible as a semaphore providing utility for the patch. While in patch 3 the maps_lock has to be a spin_lock so inatomic can use it with interrupts disabled.

@@ -2700,6 +2706,8 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
unsigned int id = attr->attr;
struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ kvm->stat.io_flic_inject_airq++;
+

This just tracks how often the function is called, and includes instances where the adapter is NULL or the call to kvm_s390_inject_airq failed.

Do you want to actually track the number of successful injects only vs every time we call this routine?

This response applies to all of the following comments about counters including this one. At this stage I do want to focus on the statistics for the number of times we enter the top of the relevant functions. For example, if we put the counter at the bottom of flic_inject_airq or kvm_arch_set_irq_inatomic, it will be silent in cases of a failed conditional. On failure of inatomic we would only see the flic counter going up which could be misleading. The current counter placement informs the correct path of execution is being taken, and the volume, which is valuable for initial deployments so we know the system is being configured correctly and provides performance insights. We expect to see a performance improvement with inatomic fast inject. If the system performance doesn't improve when it is deployed the first thing to check is the volume of each of these counters. As we observe steady state usage behavior in the field we could then consider a different focus of the counters.

@@ -2919,15 +2968,15 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
int ret;
struct s390_io_adapter *adapter;
+ kvm->stat.io_set_adapter_int++;
+

Same comment, would we rather track only the successful cases or only count times we enter the function including things like the 0->1 transition below?

Actually, the addition of this counter as well as io_flic_inject_airq seems like it should be a separate patch.

/* We're only interested in the 0->1 transition. */
if (!level)
return 0;

...

+int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level,
+ bool line_status)
+{
+ int ret;
+ struct s390_io_adapter *adapter;
+ struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ struct kvm_s390_interrupt_info *inti;
+ struct kvm_s390_interrupt s390int = {
+ .type = KVM_S390_INT_IO(1, 0, 0, 0),
+ .parm = 0,
+ };
+
+ kvm->stat.io_390_inatomic++;

Is this the right time to increment this value? There are plenty of reasons we could -EWOULDBLOCK after this point and fall back to scheduling it.

So this will only count how often we call here from irqfd_wakeup() and not how often we actually successfully make it thru the inatomic operation.

+
+ /* We're only interested in the 0->1 transition. */
+ if (!level)
+ return -EWOULDBLOCK;
+ if (e->type != KVM_IRQ_ROUTING_S390_ADAPTER)
+ return -EWOULDBLOCK;
+
+ adapter = get_io_adapter(kvm, e->adapter.adapter_id);
+ if (!adapter)
+ return -EWOULDBLOCK;
+
+ s390int.parm64 = isc_to_int_word(adapter->isc);
+ ret = adapter_indicators_set_fast(kvm, adapter, &e->adapter);
+ if (ret < 0)
+ return -EWOULDBLOCK;

We know for sure that a guest that is running in SE will always hit this because no mappings will be available.
Did you test if it would be more efficient to check the kvm for SE at the start of kvm_arch_set_irq_inatomic() and immediately return -EWOULDBLOCK there?

It might be slightly more efficient in SE environments to immediately return -EWOULDBLOCK at the start of kvm_arch_set_irq_inatomic. But the change would be fairly broad since it would require changing the mutex for kvm->lock to a spin_lock to allow checking if pv is present with interrupts disabled. I would recommend this for a follow-on if behavior in the field requires it.

+ if (!ret || adapter->masked) {
+ kvm->stat.io_390_inatomic_adapter_masked++;
+ return 0;
+ }
+
+ inti = kzalloc_obj(*inti, GFP_ATOMIC);
+ if (!inti)
+ return -EWOULDBLOCK;
+
+ if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (ret == 0)

This to me seems like the right spot to kvm->stat.io_390_inatomic++ unless I misunderstand the intent of the counter.