Re: [PATCH v3 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
From: Farhan Ali
Date: Tue Nov 18 2025 - 16:49:18 EST
On 11/18/2025 8:13 AM, Tobias Schumacher wrote:
s390 is one of the last architectures using the legacy API for setup and
teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
to the MSI parent domain API. For details, see:
https://lore.kernel.org/lkml/20221111120501.026511281@xxxxxxxxxxxxx
In detail, create an MSI parent domain for zpci which is used by
all PCI devices.
I think this statement would need to be updated with the new approach of MSI parent domain per zpci domain?
When a PCI device sets up MSI or MSI-X IRQs, the
library creates a per-device IRQ domain for this device, which is
used by the device for allocating and freeing IRQs.
The per-device domain delegates this allocation and freeing to the
parent-domain. In the end, the corresponding callbacks of the parent
domain are responsible for allocating and freeing the IRQs.
The allocation is split into two parts:
- zpci_msi_prepare() is called once for each device and allocates the
required resources. On s390, each PCI function has its own airq
vector and a summary bit, which must be configured once per function.
This is done in prepare().
- zpci_msi_alloc() can be called multiple times for allocating one or
more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
number in the kernel and the hardware IRQ number.
Freeing is split into two counterparts:
- zpci_msi_free() reverts the effects of zpci_msi_alloc() and
- zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
callend once when all IRQs are freed before a device is removed.
Since the parent domain in the end allocates the IRQs, the hwirq
encoding must be unambiguous for all IRQs of all devices. This is
achieved by encoding the hwirq using the PCI function id and the MSI
index.
Signed-off-by: Tobias Schumacher <ts@xxxxxxxxxxxxx>
---
arch/s390/Kconfig | 1 +
arch/s390/include/asm/pci.h | 4 +
arch/s390/pci/pci_bus.c | 11 ++
arch/s390/pci/pci_irq.c | 334 +++++++++++++++++++++++++++-----------------
4 files changed, 225 insertions(+), 125 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index df22b10d91415e1ed183cc8add9ad0ac4293c50e..48cd6a12bd04dfe4dd61ecc79d3401ba685c51bb 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -251,6 +251,7 @@ config S390
select HOTPLUG_SMT
select IOMMU_HELPER if PCI
select IOMMU_SUPPORT if PCI
+ select IRQ_MSI_LIB if PCI
select KASAN_VMALLOC if KASAN
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_MERGE_VMAS
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index a32f465ecf73a5cc3408a312d94ec888d62848cc..60abc84cf14fe6fb1ee149df688eea94f0983ed0 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -5,6 +5,7 @@
#include <linux/pci.h>
#include <linux/mutex.h>
#include <linux/iommu.h>
+#include <linux/irqdomain.h>
#include <linux/pci_hotplug.h>
#include <asm/pci_clp.h>
#include <asm/pci_debug.h>
@@ -109,6 +110,7 @@ struct zpci_bus {
struct list_head resources;
struct list_head bus_next;
struct resource bus_resource;
+ struct irq_domain *msi_parent_domain;
int topo; /* TID if topo_is_tid, PCHID otherwise */
int domain_nr;
u8 multifunction : 1;
@@ -310,6 +312,8 @@ int zpci_dma_exit_device(struct zpci_dev *zdev);
/* IRQ */
int __init zpci_irq_init(void);
void __init zpci_irq_exit(void);
+int zpci_create_parent_msi_domain(struct zpci_bus *zbus);
+void zpci_remove_parent_msi_domain(struct zpci_bus *zbus);
/* FMB */
int zpci_fmb_enable_device(struct zpci_dev *);
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index be8c697fea0cc755cfdb4fb0a9e3b95183bec0dc..4849420d4f72c886edbe9c5e32cf0b1513d0b555 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -15,6 +15,7 @@
#include <linux/err.h>
#include <linux/delay.h>
#include <linux/seq_file.h>
+#include <linux/irqdomain.h>
#include <linux/jump_label.h>
#include <linux/pci.h>
#include <linux/printk.h>
@@ -190,6 +191,7 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
{
struct pci_bus *bus;
int domain;
+ int rc;
domain = zpci_alloc_domain((u16)fr->uid);
if (domain < 0)
@@ -199,6 +201,12 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
zbus->multifunction = zpci_bus_is_multifunction_root(fr);
zbus->max_bus_speed = fr->max_bus_speed;
+ rc = zpci_create_parent_msi_domain(zbus);
+ if (rc) {
+ zpci_free_domain(zbus->domain_nr);
+ return -EFAULT;
+ }
+
/*
* Note that the zbus->resources are taken over and zbus->resources
* is empty after a successful call
@@ -206,10 +214,12 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, ops, zbus, &zbus->resources);
if (!bus) {
zpci_free_domain(zbus->domain_nr);
+ zpci_remove_parent_msi_domain(zbus);
return -EFAULT;
}
Nit: I wonder if it would be more readable if we used goto statements here and above.
<..snip..>
+
+/*
+ * Encode the hwirq number for the parent domain. The encoding must be unique
+ * for each IRQ of each device in the parent domain, so it uses the devfn to
+ * identify the device and the msi_index to identify the IRQ within that device.
+ */
+static inline u32 zpci_encode_hwirq(u8 devfn, u16 msi_index)
+{
+ return (devfn << 16) | msi_index;
I see we have changed the encoding from using the fid to using the devfn, any reason for changing it? AFAIU the fid is unique for every function within the system.
Also thinking it out loud, is it this going to be unique if we have multiple IRQ (if nr_irqs in zpci_msi_domain_alloc() is > 1) per MSI descriptor, unless I missed something?