Re: Endless loop with "detected XactErr"

From: Zdenek Kabelac
Date: Fri Apr 08 2011 - 11:34:15 EST


2011/4/8 Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>:
> On Fri, 8 Apr 2011, Zdenek Kabelac wrote:
>
>> >> Just with changing last number - I've got only one such line on serial console.
>> >> By looking into Â"drivers/usb/host/ehci-q.c" Â- there seems to be
>> >> endless goto loop.
>> >
>> > It isn't endless. ÂYou didn't notice the test against QH_XACTERR_MAX.
>> >
>>
>> Maybe I've not been patient enough - but I'd been waiting for quite some time
>> and the console was just scrolling this line on display - so I need to
>> turn-off the machine.
>> (After like a minute of this looping)
>
> This is because the loop ends and then starts again (you can tell
> because the retry counter goes back to 1).
>
>> > This particular error message indicates a hardware problem in the USB
>> > signals. ÂA bad contact, a bad cable, a device failure -- something
>> > like that.
>>
>> Well it's been connected to Lenovo docking station - so unsure what
>> bad cable you mean here.
>> (and AFAICT it seems to work quite well all the time).
>
> It doesn't have to be a bad cable. ÂIt could be any sort of hardware
> failure. ÂPerhaps the docking or undocking operation itself messes
> something up.
>
>> >> Here is the trace from last resume:
>> >
>> >> [ Â473.873802] ACPI: \_SB_.GDCK - undocking
>> >> [ Â473.897678] hub 2-0:1.0: state 7 ports 4 chg 0000 evt 0010
>> >> [ Â473.904809] ehci_hcd 0000:00:1a.7: detected XactErr len 0/4 retry 1
>> >
>> > And presumably additional lines containing similar messages.
>> >
>> > There's no real reason for them ever to stop, since they are only
>> > debugging messages. ÂIf you turn off CONFIG_USB_DEBUG you'll never see
>> > tham.
>>
>>
>> You mean it's normal I get machine stuck in endless loop when I turn
>> on debugging ?
>
> Are you really sure the machine is stuck? ÂIf you disable
> CONFIG_USB_DEBUG, does it hang or can you still get useful work done?
>
>> I though debug is there to debug the problem - not to add a new one ??
>
> It doesn't add any problems, since you can always turn the debugging
> off. ÂAnd the messages do aid in debugging -- if they weren't present,
> you would not have been aware of the problem.
>
>> As this happened during suspend - I had no other option than to simply
>> turn off the machine.
>
> It doesn't appear that way in the log you posted. ÂThe resume completed
> at timestamp 467.8, and the debugging messages didn't start until 6
> seconds later.
>
> You might want to track down the original source of the
> underlying error. ÂSince the machine was docked the entire time, the
> line saying:
>
> [ ï473.873802] ACPI: \_SB_.GDCK - undocking
>
> definitely looks suspicious. ÂI bet if you can fix that then the USB
> problems will go away.

Hmm looking now into more logs - it might have been used this:

"echo 1 >/sys/devices/platform/dock.0/undock"

for this particular testcase - I might have been checking several
combination and making modification of some pm script for this.
So yes - in this case the machine could have been undocked by the above command.

Is that a major problem for USB ?

This machine get into busy loop at 473.9 (the last loggged line) during suspend.
The laptop was in 'dock' - but undocked by software command.

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/