Re: 2.6.20->2.6.21 - networking dies after random time

From: Jarek Poplawski
Date: Tue Aug 07 2007 - 08:13:18 EST


On Tue, Aug 07, 2007 at 11:52:46AM +0200, Jarek Poplawski wrote:
> On Tue, Aug 07, 2007 at 11:37:01AM +0200, Marcin Ślusarz wrote:
> > 2007/8/7, Jarek Poplawski <jarkao2@xxxxx>:
> > > On Tue, Aug 07, 2007 at 09:46:36AM +0200, Marcin Ślusarz wrote:
> > > > Network card still locks up (tested on 2.6.22.1). I had to upload more
> > > > data than usual (~350 MB vs ~1-100 MB) to trigger that bug but it
> > > > might be a coincidence...
> > >
> > > Thanks! It's a good news after all - it would be really strange why
> > > this place doesn't hit more people (it seems there is some safety
> > > elsewhere for this).
> > >
> > > BTW: I hope, this previous Thomas' patch with Ingo's warning to resend.c
> > > (with a warning), had no problems with a similar load?
> > I always tested on 500-600 MB "dataset"
> >
> > > PS: Marcin, if you need a break in this testing let us know!
> > No, i don't need a break. I'll have more time in next weeks.
>
> Great! So, I'll try to send a patch with _SW_RESEND in a few hours,
> if Ingo doesn't prepare something for you.

So, the let's try this idea yet: modified Ingo's "x86: activate
HARDIRQS_SW_RESEND" patch.
(Don't forget about make oldconfig before make.)
For testing only.

Cheers,
Jarek P.

PS: alas there was not even time for "compile checking"...

---

diff -Nurp 2.6.22.1-/arch/i386/Kconfig 2.6.22.1/arch/i386/Kconfig
--- 2.6.22.1-/arch/i386/Kconfig 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.22.1/arch/i386/Kconfig 2007-08-07 13:13:03.000000000 +0200
@@ -1252,6 +1252,10 @@ config GENERIC_PENDING_IRQ
depends on GENERIC_HARDIRQS && SMP
default y

+config HARDIRQS_SW_RESEND
+ bool
+ default y
+
config X86_SMP
bool
depends on SMP && !X86_VOYAGER
diff -Nurp 2.6.22.1-/arch/x86_64/Kconfig 2.6.22.1/arch/x86_64/Kconfig
--- 2.6.22.1-/arch/x86_64/Kconfig 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.22.1/arch/x86_64/Kconfig 2007-08-07 13:13:03.000000000 +0200
@@ -690,6 +690,10 @@ config GENERIC_PENDING_IRQ
depends on GENERIC_HARDIRQS && SMP
default y

+config HARDIRQS_SW_RESEND
+ bool
+ default y
+
menu "Power management options"

source kernel/power/Kconfig
diff -Nurp 2.6.22.1-/kernel/irq/manage.c 2.6.22.1/kernel/irq/manage.c
--- 2.6.22.1-/kernel/irq/manage.c 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.22.1/kernel/irq/manage.c 2007-08-07 13:13:03.000000000 +0200
@@ -169,6 +169,14 @@ void enable_irq(unsigned int irq)
desc->depth--;
}
spin_unlock_irqrestore(&desc->lock, flags);
+#ifdef CONFIG_HARDIRQS_SW_RESEND
+ /*
+ * Do a bh disable/enable pair to trigger any pending
+ * irq resend logic:
+ */
+ local_bh_disable();
+ local_bh_enable();
+#endif
}
EXPORT_SYMBOL(enable_irq);

diff -Nurp 2.6.22.1-/kernel/irq/resend.c 2.6.22.1/kernel/irq/resend.c
--- 2.6.22.1-/kernel/irq/resend.c 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.22.1/kernel/irq/resend.c 2007-08-07 13:57:54.000000000 +0200
@@ -62,16 +62,24 @@ void check_irq_resend(struct irq_desc *d
*/
desc->chip->enable(irq);

+ /*
+ * Temporary hack to figure out more about the problem, which
+ * is causing the ancient network cards to die.
+ */
+
if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;

- if (!desc->chip || !desc->chip->retrigger ||
- !desc->chip->retrigger(irq)) {
+ if (desc->handle_irq == handle_edge_irq) {
+ if (desc->chip->retrigger)
+ desc->chip->retrigger(irq);
+ return;
+ }
#ifdef CONFIG_HARDIRQS_SW_RESEND
- /* Set it pending and activate the softirq: */
- set_bit(irq, irqs_resend);
- tasklet_schedule(&resend_tasklet);
+ WARN_ON_ONCE(1);
+ /* Set it pending and activate the softirq: */
+ set_bit(irq, irqs_resend);
+ tasklet_schedule(&resend_tasklet);
#endif
- }
}
}
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html