Stability-Problem of EHCI with a larger number of USB-Hubs/Devices

From: Matthias Schniedermeyer
Date: Fri Jul 21 2006 - 15:10:15 EST


If you want to skip the introduction, skip to the part after the dashes.

I have a bit larger "net" of USB-Devices currently consisting of 10 USB-Hubs, 9 USB-Power-Switches and 34 USB-HDDs (normaly switched of by the Power-Switches)

The USB-Hubs build a tree-structure with the Power-Switches and USB-HDDs as the leafs.
There is one 4-Port-Hub connected to the Root-Hub on a PCI-Add-On USB2-Controller.
The first 2 Ports of the first-level-Hub are connected to another level of 4-Port hubs following a third-level of "7"-Port Hubs (currently there are seven "7"-Port-Hubs). As the "7"-Port Hubs are technically two 4-Port Hubs they represend the 3th and 4th level in the same physical package with HDDs and Power-Switch devices representing leves at the 4th or 5th level.

As the Power-Switching-Device i use can switch 4 diffrent Power-Outlets there is a Power-Switching-Device connected to every 5th Port counting from the first one. On Every other Port a HDD is connected.

The system is designed so that i will have 112 (16*7) "Leave-Ports" for 88 HDDs and 22 Power-Switch-devices when i would reach maximum capacity.
(Leaving 2 Ports unused, if anyone counted)

I use a self compiled 2.6.17 kernel with the USB-Sub-System as modules, so i can load/remove the hcd-drivers. Since the stability has degraded ever since at least kernel 2.6.15.

Yesterday i bought a new USB2-Controller card, since there was a possibility that the former USB2-Controller card may have been damaged by an electrical problem i had a few month ago.

I also replaced all HUB-cables with ones i believe are better quality then the former ones as there are "Maybe the USB cable is bad"-messages in the syslog

The new Controller has a NEC-Chipset, which is ohci/ehci.
The former had a VIA-Chipset so it was probably uhci/ehci, but i don't have compiled the uhci-driver as the onboard USB-Controller of the computer is ohci.

The said computer is:
Dual P3-933Mhz, Serverworks HE-SL chipset, 3GB-RAM
System: Debian-SID, (no UDEV)
cat /proc/version
version 2.6.17 (ms@frontend) (gcc version 4.1.2 20060613 (prerelease) (Debian 4.1.1-5)) #31 SMP Sun Jun 18 18:12:58 CEST 2006

So much for the introduction:
When i load the ohci_hcd driver i can see the Hub-Ports lighting up,
and everything working fine most of the time, but sometimes only the sub-tree connected to the first port of the first-level-hub starts up, but after un-/plugging the cabel from the second port that sub-tree is initialized fine. With the ehci_hcd i can now use the system without problems.

But when i now load the ehci_hcd i can see all lights going out, then going on like when the ohci_hcd start and after a few seconds everything goes out again and syslog is flooded.

The other way is unplugging and powering-down everything, loading ohci_hcd & ehci_hcd, switching on everything and plugging in the cable to the first-level hub. This seams to initialize correctly, but breaks down a few seconds after everything seamed to be ready.

I've attached a file with the relevant syslog-extract and the config of the said kernel.

I did the following to create the log:
- I unplugged the first-level hub and phyically switched off the whole tree
- I loaded the ohci_hcd & ehci_hcd
(usbcore, usb_hid, usb_storage where also loaded, IOW the whole compiled USB-Subsystem was loaded)
- Then i switched on everything and plugged in the cable to the first level-hub.
- Every light went on, stayed on for a few seconds and then everything went down
- I unloaded ehci_hcd & ohci_hcd and stopped the tail on syslog.

Those few seconds provided the 192KB log-file attached.

Technically i have this same problem for at least since 2.6.15, or since about the beginning of this year, but it worked most of the time (and there were fewer HDDs ...) and i only have to use the system once in a while, since i use a S-ATA-"stagging"-HDD to accumulate a few Gigabytes before moving them onto the USB-HDDs (which represend >7,5 TB), but it gets more and more annoying.

Any ideas what my problem might be?

Bis denn

Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

Attachment: usb_syslog.txt.bz2
Description: Unix tar archive

Attachment: config-2.6.17-leloo.bz2
Description: Unix tar archive