kernel BUG at tg3.c:1557

From: Roland Kuhn (rkuhn@e18.physik.tu-muenchen.de)
Date: Wed Aug 07 2002 - 06:40:43 EST


Dear kernel hackers!

I've been chasing this bug now for weeks without further ideas, so please
tell me your thoughts about it:

On a dual Athlon MP with a 3ware-7850 RAID (640GB RAID-5) and 3C996B-T GE
NIC I can crash the machine with the above BUG message in virtually no
time simply by copying data both ways between the RAID and the NIC. The
BUG message shows that this can happen any time, it doesn't matter if the
interrupt is received in cpu_idle or something else. I tried noapic, but
to no avail.

Does anybody know about this problem?
How can I get more debugging information?
Can the driver be patched to gracefully handle this situation, e.g. by
resetting the card and trying again?

What I've found out till now is only that the kernel's and the NIC's view
of the world seem to be inconsistent :-(

For our application stability is much more important than a few TCP
retransmits...

Thanks in advance,
                                        Roland

+---------------------------+-------------------------+
| TU Muenchen | |
| Physik-Department E18 | Raum 3558 |
| James-Franck-Str. | Telefon 089/289-12592 |
| 85747 Garching | |
+---------------------------+-------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Aug 07 2002 - 22:00:35 EST