Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption

From: Yinghai Lu
Date: Wed Apr 08 2009 - 18:30:30 EST


On Wed, Apr 8, 2009 at 2:07 PM, Gary Hade <garyhade@xxxxxxxxxx> wrote:
> Impact: Eliminates a race that can leave the system in an
>        unusable state
>
> During rapid offlining of multiple CPUs there is a chance
> that an IRQ affinity move destination CPU will be offlined
> before the IRQ affinity move initiated during the offlining
> of a previous CPU completes.  This can happen when the device
> is not very active and thus fails to generate the IRQ that is
> needed to complete the IRQ affinity move before the move
> destination CPU is offlined.  When this happens there is an
> -EBUSY return from __assign_irq_vector() during the offlining
> of the IRQ move destination CPU which prevents initiation of
> a new IRQ affinity move operation to an online CPU.  This
> leaves the IRQ affinity set to an offlined CPU.
>
> I have been able to reproduce the problem on some of our
> systems using the following script.  When the system is idle
> the problem often reproduces during the first CPU offlining
> sequence.
>
> #!/bin/sh
>
> SYS_CPU_DIR=/sys/devices/system/cpu
> VICTIM_IRQ=25
> IRQ_MASK=f0
>
> iteration=0
> while true; do
>  echo $iteration
>  echo $IRQ_MASK > /proc/irq/$VICTIM_IRQ/smp_affinity
>  for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
>    echo 0 > $cpudir/online
>  done
>  for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
>    echo 1 > $cpudir/online
>  done
>  iteration=`expr $iteration + 1`
> done
>
> The proposed fix takes advantage of the fact that when all
> CPUs in the old domain are offline there is nothing to be done
> by send_cleanup_vector() during the affinity move completion.
> So, we simply avoid setting cfg->move_in_progress preventing
> the above mentioned -EBUSY return from __assign_irq_vector().
> This allows initiation of a new IRQ affinity move to a CPU
> that is not going offline.
>
> Signed-off-by: Gary Hade <garyhade@xxxxxxxxxx>
>
> ---
>  arch/x86/kernel/apic/io_apic.c |   11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> Index: linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c
> ===================================================================
> --- linux-2.6.30-rc1.orig/arch/x86/kernel/apic/io_apic.c        2009-04-08 09:23:00.000000000 -0700
> +++ linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c     2009-04-08 09:23:16.000000000 -0700
> @@ -363,7 +363,8 @@ set_extra_move_desc(struct irq_desc *des
>        struct irq_cfg *cfg = desc->chip_data;
>
>        if (!cfg->move_in_progress) {
> -               /* it means that domain is not changed */
> +               /* it means that domain has not changed or all CPUs
> +                * in old domain are offline */
>                if (!cpumask_intersects(desc->affinity, mask))
>                        cfg->move_desc_pending = 1;
>        }
> @@ -1262,8 +1263,11 @@ next:
>                current_vector = vector;
>                current_offset = offset;
>                if (old_vector) {
> -                       cfg->move_in_progress = 1;
>                        cpumask_copy(cfg->old_domain, cfg->domain);
> +                       if (cpumask_intersects(cfg->old_domain,
> +                                              cpu_online_mask)) {
> +                               cfg->move_in_progress = 1;
> +                       }
>                }
>                for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
>                        per_cpu(vector_irq, new_cpu)[vector] = irq;
> @@ -2492,7 +2496,8 @@ static void irq_complete_move(struct irq
>                if (likely(!cfg->move_desc_pending))
>                        return;
>
> -               /* domain has not changed, but affinity did */
> +               /* domain has not changed or all CPUs in old domain
> +                * are offline, but affinity changed */
>                me = smp_processor_id();
>                if (cpumask_test_cpu(me, desc->affinity)) {
>                        *descp = desc = move_irq_desc(desc, me);
> --

so you mean during __assign_irq_vector(), cpu_online_mask get updated?
with your patch, how about that it just happen right after you check
that second time.

it seems we are missing some lock_vector_lock() on the remove cpu from
online mask.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/