Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device152d:2338
From: Robert Hancock
Date: Tue Jul 22 2008 - 04:45:53 EST
Tomas Styblo wrote:
* Robert Hancock <hancockr@xxxxxxx> [Tue, 22 Jul 2008]:
In any case, given that your code apparently fixes the corruption it seems
that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the
DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the
SCSI layer looks at the sense data and says "hmm, nothing to worry about
here" and carries on.
That's exactly what I thought was happening, after a cursory
look at the SCSI code.
I think we do need something like your patch, though it should likely be
moved inside the if (need_auto_sense) check, and I don't see a reason to
limit to this device ID only.
Thank you. This is a very insidious bug as it doesn't manifest
itself very often, months of data corruption may pass before you
notice it.
So is there a bug in the chipset, or does the error handling code
not follow specifications?
It looks clear to me that it's a bug in the chipset. It's supposed to
set some valid sense data if an error occurs, not just set the "failed"
flag in the USB storage status word. (Presumably the fact that these
errors are occurring in the first place is a bug in itself.. though that
could be a problem with the enclosure or drive as well.)
However the kernel should be more robust and not ignore the error
indication that it is giving.
I wonder if the company that makes the chipset should be notified
about this problem?
I suppose it wouldn't hurt to let JMicron know about this. I doubt they
could do anything for existing chipsets, but it might help them avoid
this bug in future designs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/