Re: [PATCH] ARM: irq: Add IRQ_SET_MASK_OK_DONE handling in migrate_one_irq()

From: Marc Zyngier
Date: Tue Jan 08 2019 - 09:16:40 EST


Hi Dietmar,

On 08/01/2019 13:58, Dietmar Eggemann wrote:
> Arm TC2 (multi_v7_defconfig plus CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and
> CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y) fails hotplug stress tests.
>
> This issue was tracked down to a missing copy of the new affinity
> cpumask of the vexpress-spc interrupt into struct
> irq_common_data.affinity when the interrupt is migrated in
> migrate_one_irq().
>
> Commit 0407daceedfe ("irqchip/gic: Return IRQ_SET_MASK_OK_DONE in the
> set_affinity method") changed the return value of the irq_set_affinity()
> function of the GIC from IRQ_SET_MASK_OK to IRQ_SET_MASK_OK_DONE.
>
> In migrate_one_irq() if the current irq affinity mask and the cpu
> online mask do not share any CPU, the affinity mask is set to the cpu
> online mask. In this case (ret == true) and when the irq chip
> function irq_set_affinity() returns successfully (IRQ_SET_MASK_OK),
> struct irq_common_data.affinity should also be updated.
>
> Add IRQ_SET_MASK_OK_DONE next to IRQ_SET_MASK_OK when checking that the
> irq chip function irq_set_affinity() returns successfully.
>
> Commit 2cb625478f8c ("genirq: Add IRQ_SET_MASK_OK_DONE to support
> stacked irqchip") only added IRQ_SET_MASK_OK_DONE handling to
> irq_do_set_affinity() in the irq core and not to the Arm32 irq code.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> ---
>
> The hotplug issue on Arm TC2 happens because the vexpress-spc interrupt
> (irq=22) is affine to CPU0. This occurs since it is setup early when the
> cpu_online_mask is still 0.
> But the problem with the missing copy of the affinity mask should occur
> with every interrupt which is forced to migrate.
>
> With additional debug in irq_setup_affinity():
>
> [0.000619] irq_setup_affinity(): irq=17 mask=0 cpu_online_mask=0 set=0-4
> [0.007065] irq_setup_affinity(): irq=22 mask=0 cpu_online_mask=0 set=0-4
> [3.372907] irq_setup_affinity(): irq=47 mask=0-4 cpu_online_mask=0-4
> set=0-4
>
> cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4
> 22: 316 0 0 0 0 GIC-0 127
> Level vexpress-spc
>
> cat /proc/irq/22/smp_affinity_list
> 0
>
> arch/arm/kernel/irq.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
> index 9908dacf9229..ddb828b0235b 100644
> --- a/arch/arm/kernel/irq.c
> +++ b/arch/arm/kernel/irq.c
> @@ -117,6 +117,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
> const struct cpumask *affinity = irq_data_get_affinity_mask(d);
> struct irq_chip *c;
> bool ret = false;
> + int ret2;
>
> /*
> * If this is a per-CPU interrupt, or the affinity does not
> @@ -131,9 +132,14 @@ static bool migrate_one_irq(struct irq_desc *desc)
> }
>
> c = irq_data_get_irq_chip(d);
> - if (!c->irq_set_affinity)
> + if (!c->irq_set_affinity) {
> pr_debug("IRQ%u: unable to set affinity\n", d->irq);
> - else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && ret)
> + return ret;
> + }
> +
> + ret2 = c->irq_set_affinity(d, affinity, false);
> +
> + if ((ret2 == IRQ_SET_MASK_OK || ret2 == IRQ_SET_MASK_OK_DONE) && ret)
> cpumask_copy(irq_data_get_affinity_mask(d), affinity);
>
> return ret;
>

On the arm64 side, we've solved the exact same issue by getting rid of
this code and using the generic implementation. See 217d453d473c5
("arm64: fix a migrating irq bug when hotplug cpu"), which uses
irq_migrate_all_off_this_cpu instead.

I'm not sure there is much value in not using the core code in this case.

Thanks,

M.
--
Jazz is not dead. It just smells funny...