Re: [2.6.30-rc2] usb reset during big file transfer and ext3 error

From: RogÃrio Brito
Date: Wed Apr 22 2009 - 18:08:15 EST


Hi, Robert.

On Apr 21 2009, Robert Hancock wrote:
> (ccing linux-usb)

Ok.

> RogÃrio Brito wrote:
(...)
>> Unfortunately, when I was transferring the contents of 2 DVDs from the
>> main IDE HD to a USB external HD, I got errors from the USB host, the
>> writes on the external HD become failures and the ext3 filesystem there
>> enters into error mode, going read-only.
>>
>> I eventually lose the access to the device (i.e., the /dev/sd??? device
>> isn't there anymore) and I then have to re-run fsck on the given
>> filesystem.
>>
>> This has already happened 2 or 3 times already and I observed that it
>> only occurs when there is high traffic---if I am, say, compiling the
>> kernel on that external HD, I don't see any problems.

I just saw it reoccur once more, this time inducing a stacktrace related
to ext3. :-(

>> Attached is part of the dmesg log that shows the problem. I put the
>> whole dmesg at <http://rb.doesntexist.org/linux/>.
>>
>> As always, if any further information is needed, please let me know.
>
> You're seeing these:
>
> [103051.265045] ehci_hcd 0000:00:1d.7: detected XactErr len 1536/4096
> retry 1
> [103051.265156] ehci_hcd 0000:00:1d.7: detected XactErr len 1536/4096
> retry 2
> [103051.265281] ehci_hcd 0000:00:1d.7: detected XactErr len 1536/4096
> retry 3
> [103051.265406] ehci_hcd 0000:00:1d.7: detected XactErr len 1536/4096
> retry 4

Precisely.

> According to the EHCI spec, XactErr is "Set to a one by the Host
> Controller during status update in the case where the host did not
> receive a valid response from the device (Timeout, CRC, Bad PID,
> etc.)"

Is there any way of controlling the number of retries in the host
controller? Or, perhaps, of controlling the time between retries so that
the device can shape it up again?

> Quite likely this is some kind of hardware problem - maybe the USB
> port doesn't quite provide enough power for the drive, etc.

I see. The first thing I thought about when I saw this comment of yours
was that there could be some heat issue and the drive not cooling
down.

In this particular case, the USB enclosure is externally powered and it
conatins a SATA drive. I also had never seen it occour before when
connected to an EHCI port on another system, even while transferring
more data.

> A lot of these USB enclosure devices are also rather poor quality in
> general..

Agreed. Not everybody does things correctly by the book. OTOH, these are
the devices present in "the real world". Would there be workarounds for
such situations?


Thanks, RogÃrio Brito.

--
RogÃrio Brito : rbrito@{mackenzie,ime.usp}.br : GPG key 1024D/7C2CAEB8
http://www.ime.usp.br/~rbrito : http://meusite.mackenzie.com.br/rbrito
Projects: algorithms.berlios.de : lame.sf.net : vrms.alioth.debian.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/