[tip: x86/irq] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

From: tip-bot2 for Thomas Gleixner
Date: Wed Sep 16 2020 - 11:51:28 EST

The following commit has been merged into the x86/irq branch of tip:

Commit-ID: 13b90cadfc294718dd5a89e1fcf103477b01eb50
Gitweb: https://git.kernel.org/tip/13b90cadfc294718dd5a89e1fcf103477b01eb50
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
AuthorDate: Wed, 26 Aug 2020 13:16:32 +02:00
Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CommitterDate: Wed, 16 Sep 2020 16:52:28 +02:00

genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

The documentation of irq_chip_compose_msi_msg() claims that with
hierarchical irq domains the first chip in the hierarchy which has an
irq_compose_msi_msg() callback is chosen. But the code just keeps
iterating after it finds a chip with a compose callback.

The x86 HPET MSI implementation relies on that behaviour, but that does not
make it more correct.

The message should always be composed at the domain which manages the
underlying resource (e.g. APIC or remap table) because that domain knows
about the required layout of the message.

On X86 the following hierarchies exist:

1) vector -------- PCI/MSI
2) vector -- IR -- PCI/MSI

The vector domain has a different message format than the IR (remapping)
domain. So obviously the PCI/MSI domain can't compose the message without
having knowledge about the parent domain, which is exactly the opposite of
what hierarchical domains want to achieve.

X86 actually has two different PCI/MSI chips where #1 has a compose
callback and #2 does not. #2 delegates the composition to the remap domain
where it belongs, but #1 does it at the PCI/MSI level.

For the upcoming device MSI support it's necessary to change this and just
let the first domain which can compose the message take care of it. That
way the top level chip does not have to worry about it and the device MSI
code does not need special knowledge about topologies. It just sets the
compose callback to NULL and lets the hierarchy pick the first chip which
has one.

Due to that the attempt to move the compose callback from the direct
delivery PCI/MSI domain to the vector domain made the system fail to boot
with interrupt remapping enabled because in the remapping case
irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
of the vector domain which obviously creates the wrong format for the remap

Break out of the loop when the first irq chip with a compose callback is
found and fixup the HPET code temporarily. That workaround will be removed
once the direct delivery compose callback is moved to the place where it
belongs in the vector domain.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Reviewed-by: Marc Zyngier <maz@xxxxxxxxxx> Link: https://lore.kernel.org/r/20200826112331.047917603@xxxxxxxxxxxxx

arch/x86/kernel/apic/msi.c | 7 +++++--
kernel/irq/chip.c | 9 ++++-----
kernel/irq/internals.h | 9 +++++++++
3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index c2b2911..7f7bc6a 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id)
info.type = X86_IRQ_ALLOC_TYPE_HPET;
info.hpet_id = hpet_id;
parent = irq_remapping_get_ir_irq_domain(&info);
- if (parent == NULL)
+ if (parent == NULL) {
parent = x86_vector_domain;
- else
+ } else {
hpet_msi_controller.name = "IR-HPET-MSI";
+ /* Temporary fix: Will go away */
+ hpet_msi_controller.irq_compose_msi_msg = NULL;
+ }

fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 857f5f4..0ae308e 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1541,18 +1541,17 @@ EXPORT_SYMBOL_GPL(irq_chip_release_resources_parent);
int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
- struct irq_data *pos = NULL;
+ struct irq_data *pos;

- for (; data; data = data->parent_data)
+ for (pos = NULL; !pos && data; data = irqd_get_parent_data(data)) {
if (data->chip && data->chip->irq_compose_msi_msg)
pos = data;
+ }
if (!pos)
return -ENOSYS;

pos->chip->irq_compose_msi_msg(pos, msg);
return 0;

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 7db284b..5436352 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -473,6 +473,15 @@ static inline void irq_domain_deactivate_irq(struct irq_data *data)

+static inline struct irq_data *irqd_get_parent_data(struct irq_data *irqd)
+ return irqd->parent_data;
+ return NULL;
#include <linux/debugfs.h>