Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irqmigration.

From: l . genoni
Date: Sat Feb 03 2007 - 09:32:31 EST


As I reported when I tested this patch, it works, but I could see an abnormally high load averay while triggering the error message. anyway, it is better to have an high load averag three or four times higher than what you would expect then a crash/reboot. isn't it? :)

Luigi Genoni

p.s.
will test the other definitive patch on montday on both 8 and 16 CPU system.

On Sat, 3 Feb 2007, Eric W. Biederman wrote:

Date: Sat, 03 Feb 2007 00:55:11 -0700
From: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
To: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx,
"Lu, Yinghai" <yinghai.lu@xxxxxxx>,
Luigi Genoni <luigi.genoni@xxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxx>,
Natalie Protasevich <protasnb@xxxxxxxxx>, Andi Kleen <ak@xxxxxxx>
Subject: Re: [PATCH 2/2] x86_64 irq: Handle irqs pending in IRR during irq
migration.
Resent-Date: Sat, 03 Feb 2007 09:05:10 +0100
Resent-From: <l.genoni@xxxxxx>

Arjan van de Ven <arjan@xxxxxxxxxxxxx> writes:

Once the migration operation is complete we know we will receive
no more interrupts on this vector so the irq pending state for
this irq will no longer be updated. If the irq is not pending and
we are in the intermediate state we immediately free the vector,
otherwise in we free the vector in do_IRQ when the pending irq
arrives.

So is this a for-2.6.20 thing? The bug was present in 2.6.19, so
I assume it doesn't affect many people?

I got a few reports of this; irqbalance may trigger this kernel bug it
seems... I would suggest to consider this for 2.6.20 since it's a
hard-hang case


Yes. The bug I fixed will not happen if you don't migrate irqs.

At the very least we want the patch below (already in -mm)
that makes it not a hard hang case.

Subject: [PATCH] x86_64: Survive having no irq mapping for a vector

Occasionally the kernel has bugs that result in no irq being
found for a given cpu vector. If we acknowledge the irq
the system has a good chance of continuing even though we dropped
an missed an irq message. If we continue to simply print a
message and drop and not acknowledge the irq the system is
likely to become non-responsive shortly there after.

Signed-off-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
---
arch/x86_64/kernel/irq.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86_64/kernel/irq.c b/arch/x86_64/kernel/irq.c
index 0c06af6..648055a 100644
--- a/arch/x86_64/kernel/irq.c
+++ b/arch/x86_64/kernel/irq.c
@@ -120,9 +120,14 @@ asmlinkage unsigned int do_IRQ(struct pt_regs *regs)

if (likely(irq < NR_IRQS))
generic_handle_irq(irq);
- else if (printk_ratelimit())
- printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
- __func__, smp_processor_id(), vector);
+ else {
+ if (!disable_apic)
+ ack_APIC_irq();
+
+ if (printk_ratelimit())
+ printk(KERN_EMERG "%s: %d.%d No irq handler for vector\n",
+ __func__, smp_processor_id(), vector);
+ }

irq_exit();

--
1.4.4.1.g278f

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/