Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq

From: Marc Zyngier
Date: Fri May 29 2020 - 04:32:38 EST


Hi Ali,

On 2020-05-29 02:55, Ali Saidi wrote:
If an interrupt is disabled the ITS driver has sent a discard removing
the DeviceID and EventID from the ITT. After this occurs it can't be
moved to another collection with a MOVI and a command error occurs if
attempted. Before issuing the MOVI command make sure that the IRQ isn't
disabled and change the activate code to try and use the previous
affinity.

Signed-off-by: Ali Saidi <alisaidi@xxxxxxxxxx>
---
drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 124251b0ccba..1235dd9a2fb2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
const struct cpumask *mask_val,
/* don't set the affinity when the target cpu is same as current one */
if (cpu != its_dev->event_map.col_map[id]) {
target_col = &its_dev->its->collections[cpu];
- its_send_movi(its_dev, target_col, id);
+
+ /* If the IRQ is disabled a discard was sent so don't move */
+ if (!irqd_irq_disabled(d))
+ its_send_movi(its_dev, target_col, id);
+

This looks wrong. What you are testing here is whether the interrupt
is masked, not that there isn't a valid translation.

In the commit message, you're saying that we've issued a discard. This
hints at doing a set_affinity on an interrupt that has been deactivated
(mapping removed). Is that actually the case? If so, why was it deactivated
the first place?

its_dev->event_map.col_map[id] = cpu;
irq_data_update_effective_affinity(d, cpumask_of(cpu));
}
@@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
irq_domain *domain,
if (its_dev->its->numa_node >= 0)
cpu_mask = cpumask_of_node(its_dev->its->numa_node);

- /* Bind the LPI to the first possible CPU */
- cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+ /* If the cpu set to a different CPU that is still online use it */
+ cpu = its_dev->event_map.col_map[event];
+
+ cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
+
+ if (!cpumask_test_cpu(cpu, cpu_mask)) {
+ /* Bind the LPI to the first possible CPU */
+ cpu = cpumask_first(cpu_mask);
+ }
+
if (cpu >= nr_cpu_ids) {
if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
return -EINVAL;

So you deactivate an interrupt, do a set_affinity that doesn't issue
a MOVI but preserves the affinity, then reactivate it and hope that
the new mapping will target the "right" CPU.

That seems a bit mad, but I presume this isn't the whole story...

Thanks,

M.
--
Jazz is not dead. It just smells funny...