Re: [patch V2 04/46] genirq/chip: Use the first chip in irq_chip_compose_msi_msg()

From: Marc Zyngier
Date: Wed Aug 26 2020 - 15:50:39 EST


On Wed, 26 Aug 2020 12:16:32 +0100,
Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> The documentation of irq_chip_compose_msi_msg() claims that with
> hierarchical irq domains the first chip in the hierarchy which has an
> irq_compose_msi_msg() callback is chosen. But the code just keeps
> iterating after it finds a chip with a compose callback.
>
> The x86 HPET MSI implementation relies on that behaviour, but that does not
> make it more correct.
>
> The message should always be composed at the domain which manages the
> underlying resource (e.g. APIC or remap table) because that domain knows
> about the required layout of the message.
>
> On X86 the following hierarchies exist:
>
> 1) vector -------- PCI/MSI
> 2) vector -- IR -- PCI/MSI
>
> The vector domain has a different message format than the IR (remapping)
> domain. So obviously the PCI/MSI domain can't compose the message without
> having knowledge about the parent domain, which is exactly the opposite of
> what hierarchical domains want to achieve.
>
> X86 actually has two different PCI/MSI chips where #1 has a compose
> callback and #2 does not. #2 delegates the composition to the remap domain
> where it belongs, but #1 does it at the PCI/MSI level.
>
> For the upcoming device MSI support it's necessary to change this and just
> let the first domain which can compose the message take care of it. That
> way the top level chip does not have to worry about it and the device MSI
> code does not need special knowledge about topologies. It just sets the
> compose callback to NULL and lets the hierarchy pick the first chip which
> has one.
>
> Due to that the attempt to move the compose callback from the direct
> delivery PCI/MSI domain to the vector domain made the system fail to boot
> with interrupt remapping enabled because in the remapping case
> irq_chip_compose_msi_msg() keeps iterating and choses the compose callback
> of the vector domain which obviously creates the wrong format for the remap
> table.
>
> Break out of the loop when the first irq chip with a compose callback is
> found and fixup the HPET code temporarily. That workaround will be removed
> once the direct delivery compose callback is moved to the place where it
> belongs in the vector domain.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> V2: New patch. Note, that this might break other stuff which relies on the
> current behaviour, but the hierarchy composition of DT based chips is
> really hard to follow.

Grepping around, I don't think there is any occurrence of two irqchips
providing irq_compose_msi() that can share a hierarchy on any real
system, so we should be fine. Famous last words.

> ---
> arch/x86/kernel/apic/msi.c | 7 +++++--
> kernel/irq/chip.c | 12 +++++++++---
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/kernel/apic/msi.c
> +++ b/arch/x86/kernel/apic/msi.c
> @@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai
> info.type = X86_IRQ_ALLOC_TYPE_HPET;
> info.hpet_id = hpet_id;
> parent = irq_remapping_get_ir_irq_domain(&info);
> - if (parent == NULL)
> + if (parent == NULL) {
> parent = x86_vector_domain;
> - else
> + } else {
> hpet_msi_controller.name = "IR-HPET-MSI";
> + /* Temporary fix: Will go away */
> + hpet_msi_controller.irq_compose_msi_msg = NULL;
> + }
>
> fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
> hpet_id);
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_
> struct irq_data *pos = NULL;
>
> #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
> - for (; data; data = data->parent_data)
> -#endif
> - if (data->chip && data->chip->irq_compose_msi_msg)
> + for (; data; data = data->parent_data) {
> + if (data->chip && data->chip->irq_compose_msi_msg) {
> pos = data;
> + break;
> + }
> + }
> +#else
> + if (data->chip && data->chip->irq_compose_msi_msg)
> + pos = data;
> +#endif
> if (!pos)
> return -ENOSYS;
>
>
>

Is it just me, or is this last change more complex than it ought to be?

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 857f5f4c8098..25e18b73699c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
struct irq_data *pos = NULL;

#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
- for (; data; data = data->parent_data)
+ for (; data && !pos; data = data->parent_data)
#endif
if (data->chip && data->chip->irq_compose_msi_msg)
pos = data;

Though the for loop in a #ifdef in admittedly an acquired taste...

Reviewed-by: Marc Zyngier <maz@xxxxxxxxxx>

M.

--
Without deviation from the norm, progress is not possible.