TLB IPI Wait Errors & APIC interrupt handling

From: Amy Abascal-Turner (amy@vasoftware.com)
Date: Fri Dec 14 2001 - 17:30:38 EST


Hi!

VA Linux (Software) is in dire need of some kernel help.
If you know anything about TLB IPI and/or APIC interrupt
handling, we will gladly pay well for your expertise and
time. We need the help on a very short timeline.

A brief description of the problem is below. Please email
me or call me at 408-621-7054 (or the number below) if you
can help or would like further details.

Thanks!

Amy

--
Amy Abascal-Turner                   510-687-6741
Sr. Mgr. Product Support       amy@vasoftware.com
VA SOFTWARE CORP.       http://www.vasoftware.com
-------------------------------------------------

Brief History

A scientific computing cluster of 600 VA 1220s has been experiencing various problems under heavy loading conditions under production scenarios. VA engineers have been dedicated to identifying and solving these problems and although the situation has vastly improved, it is still not completely resolved. The primary issue remaining is random rebooting in SMP mode contributing to instability as a cluster.

Technical Problems

Random Reboots: "TLB IPI Wait" errors, possibly indicative of kernel deadlock. This will require kernel-development expertise to resolve.

Reboots possibly indicative of APIC interrupt handling which will require kernel development expertise to resolve.

Internal Clock Skew: resolved on by replacing motherboard on most nodes experiencing problem. Suspect that some clock problems are side-effects of the APIC/TLB issues noted.

Resolution

In order to resolve the Reboot / TLB/IPI issues, the expertise of a kernel developer is required. We are currently identifying resources to contract with to analyze the problem(s) and implement a solution.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:29 EST