[PATCH]: partially fixes APIC interrupts to almost eliminate usb ohci hang on Nvidia MCP78S (nForce7xx, 8200, etc...) chipsets - help needed to fix this fully

From: Zbigniew Luszpinski
Date: Sat Jul 10 2010 - 19:34:51 EST


Hello,

long history short:

the io_apic2.patch provides two kernel parameters:
nofasteoiapic - replaces fasteoi handler with level one for all fasteoi
interrupts.

nofasteoiapic=<list of irqs numbers> - replaces fasteoi handler with
level for given irqs. This parameter does not work yet. I made mistake in
this parameter code I can not find.

why needed:
This patch with nofasteoiapic parameter activated improves ohci stability
by 80% for middle speed usb devices on Nvidia nForce MCP78S chipset
(10de:077b, 10de:077d usb ohci controllers). Without the patch any usb 1.1
device will work for few minutes and hang after random time with timeout -
usb device is not responding.
It will not work with fast speed devices like usb audio - they will keep
hanging. Only Linux has hanging ohci. Windows XP does not. So this is
software incompatibility.

What can be done and I can not do:
-find better solution to have usb ohci stable 100% on all usb devices
without changing fasteoi to level.
-add autodetection to apply patch only for 10de:077b, 10de:077d interrupt
handlers. At interrupt setup code Linux does not know which device which
interrupt has so it is hard/impossible to do autodetection to apply the
patch only for devices which needs it.
-find bug in nofasteoiapic=<list of irqs numbers> procedure.
-do not use interrupts for ohci - use i/o registers polling

This task is for someone brave and skilled here. I do not feel powerful
enough to handle these tasks. I barely made this attached patch. If you
have any suggestion or pieces of code I could test (experimental fixes
which may help or debug/diagnose aids) please send them to me. Especially
I would like to test code which will use polling instead of interrupts for
ohci only.

I reported this bug to Nvidia, they reproduced it and confirmed it's
existence. Level interrupt handler improves ohci stability.
Unfortunately they also do not know so far how to fix this.
This mailing list is last hope. If nothing can be done we should blacklist
these mcp78s ohci controllers as broken to avoid people reporting all usb
devices as broken when actually ohci controller breaks everything.

----

full history:

All Nvidia MCP78S family chipsets (nForce7xx, 8200, 9x00) have probably
silicon bug which causes integrated usb ohci controllers:
10de:077b, 10de:077d to hang on Linux only. WindowsXP SP3 is not affected
- even on clean install with bare windows CD only - without external
drivers. I'm very curious how they have done that only Linux crashes.
Oldest tested kernel 2.6.18 from RHEL5, the newest: 2.6.34.1.

The ohci hang moment depends on usb load - the bigger and more constant
transfer the sooner the hang will happen. Let's divide usb 1.1 devices:
idle - when usb devices are connected but do nothing - rock solid
no crash.
slow - usb keyboard/mouse - never hangs. Usb mouse can hang ohci if
waving/moving mouse like crazy. Normal use no hang.
medium speed - usb adsl modem 1Mbit ISP subscription. Without patch
ohci hangs after few minutes of use. Checking several rss channels
for news hangs ohci. With patch it does not. However opening 63 tabs in
firefox at once will hang ohci with patch enabled.
Without the patch connecting usb pendrive/hdd will hang ohci on plugin or
soon after. With patch enabled no hang.
fast - usb fm radio using alsa usb audio as transmission way: 16bit 96kHz
stereo stream. Always hangs in less than 2 minutes no matter if patch is
enabled or not. The same goes to IrDA usb dongle 4Mbit Only noapic kernel
boot parameter makes it stable 90% of time.

I checked acpi tables and they are clean. So no Linux trap.
The bug exist not only on my mainboard but all from different
manufacturers. All these mainboards with this bug has only one in common:
Nvidia MCP78S chipset. So this must be silicon bug in chipset.

After playing with kernel boot parameters I found that noapic or
acpi=noirq
parameters workarounds the bug in 95%. acpi=noirq just disables APIC
interrupt controller so does the same as noapic.

To fix this bug on Linux we have to make Linux Windows XP compatible.
I made first step with the patch included. Linux by default uses fasteoi
interrupt handler. WindowsXP level handler. So Linux when forced by patch
to use level interrupt handler have ohci stable by 80% of the time.
In noapic mode it is 90% stable.

noapic solution is bad: limits CPU to 1 core only, no 100% stable ohci :(
nofasteoi parameter provided by patch is better: 80% stability, all cpu
cores active but usb audio hangs and stability of other devices is weak.

My previous mainboard: Nvidia MCP51 chipset based worked excellent.
After replacing it with Nvidia MCP78S chipset based mainboard usb ohci bug
appeared.

List of hardware used:
previous mainboard: Asus A8N-VM CSM (MCP51 chipset works excellent)
current mainboard: Asrock K10N78FullHD-hSLI rev. 3.0 with current bios
(broken ohci usb only on Linux everything else excellent).

usb devices used:
pendrive: Kingston 8 GB
usb hdd: Seagate 80GB SATA1 in ICY BOX usb case
usb irda dongle: Stir4200 module/chipset/Linux driver
usb adsl modems: Speedtouch 330 and ZXDSL852 unicorn2 chipset/Linux driver
usb radio: Silabs fm radio usb: radio_usb_si470x linux driver
usb printer: hp deskjet 5940
usb keyboard: genius
usb mouse: Logitech pilot mouse and logitech trackman trackball and pixart
mouse

my bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=13405
(now I do not think this is acpi problem)

list of attached files:
io_apic2.patch - copy it to /usr/src/linux-2.6.34.1/arch/x86/kernel/apic
and do patch -p0 < io_apic2.patch
after kernel compilation boot new kernel with nofasteoiapic parameter
added.
ohcifail.tar.gz - dumps of dmesg, interrupts, /proc and /sys important
files.

have a nice day,
Zbigniew Luszpinski
--- io_apic.c.zby 2010-05-16 23:17:36.000000000 +0200
+++ io_apic.c 2010-07-10 21:27:55.000000000 +0200
@@ -74,6 +74,10 @@
*/
int sis_apic_bug = -1;

+bool noFastEoiHandler = 0;
+#define MAX_LVL_IRQS_NR 24
+int irq_lvl_required[MAX_LVL_IRQS_NR];
+
static DEFINE_RAW_SPINLOCK(ioapic_lock);
static DEFINE_RAW_SPINLOCK(vector_lock);

@@ -123,6 +127,27 @@
}
early_param("noapic", parse_noapic);

+static int __init parse_NoFastEoiApic(char *str)
+{
+ /* replace the default fasteoi interrupt handler with level one */
+ noFastEoiHandler = 1;
+ return 0;
+}
+early_param("nofasteoiapic", parse_NoFastEoiApic);
+
+static int __init parse_NoFastEoiApicAt(char *str)
+{
+ /* Reset level int table to default -1 */
+ int i;
+ for(i = 0; i < 24; i++) irq_lvl_required[i] = -1;
+ /* force level handler for irqs instead default fasteoi */
+ get_options(&str, MAX_LVL_IRQS_NR, irq_lvl_required);
+ for(i = 0; i < 24; i++) apic_printk(APIC_VERBOSE, KERN_INFO
+ "Interrupt table: position %d value %d\n", i, irq_lvl_required[i]);
+ return 0;
+}
+early_param("nofasteoiapic=", parse_NoFastEoiApicAt);
+
struct irq_pin_list {
int apic, pin;
struct irq_pin_list *next;
@@ -1326,6 +1351,21 @@
}
#endif

+int CheckLevelNeeded(int irq)
+{
+/* Looks if level irq is on the list */
+ int i, result = 0;
+ for(i = 0; i < MAX_LVL_IRQS_NR; i++)
+ {
+ if(irq_lvl_required[i] == irq) result = irq;
+ if(irq_lvl_required[i] < 1) result = 0;
+ }
+ apic_printk(APIC_VERBOSE, KERN_INFO, "Interrupt found %d.\n", result);
+ return result;
+}
+
+
+
static void ioapic_register_intr(int irq, struct irq_desc *desc, unsigned long trigger)
{

@@ -1348,10 +1388,21 @@
}

if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
- trigger == IOAPIC_LEVEL)
+ trigger == IOAPIC_LEVEL) {
+ if (noFastEoiHandler)
+ set_irq_chip_and_handler_name(irq, &ioapic_chip,
+ handle_level_irq,
+ "level");
+ else if (CheckLevelNeeded(irq)) {
+ set_irq_chip_and_handler_name(irq, &ioapic_chip,
+ handle_level_irq,
+ "level");
+ }
+ else
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_fasteoi_irq,
"fasteoi");
+ }
else
set_irq_chip_and_handler_name(irq, &ioapic_chip,
handle_edge_irq, "edge");

Attachment: ohcifail.tar.gz
Description: application/compressed-tar

Attachment: smime.p7s
Description: S/MIME cryptographic signature