Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruptionbugfix - device 152d:2338

From: Robert Hancock
Date: Wed Jul 30 2008 - 16:46:50 EST


Matthew Dharm wrote:
On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
Alan Stern wrote:
On Wed, 23 Jul 2008, Robert Hancock wrote:

It remains an issue, though, that if there's no underflow, if the device reports an error in the CSW but doesn't provide sense data, we assume nothing bad happened and don't retry. That definitely does not seem correct. The device is not supposed to do this, but with how crappily some of these devices are designed we should be more defensive.
The problem is, what can you do? The device has said that something was wrong, but it hasn't told you what. Without knowing what went wrong, you can't know how to recover.

Yes and no. If ASC/ASCQ is clear, then it's telling you that nothing is
wrong. The device is contradicting itself. That doesn't really help us
here, but it's a point I like to be clear on.

I suppose in such cases we could simply report that the command failed
completely.
I think that is what we need to do. The SCSI/block layers should retry the command or report a failure to userspace. Above all else we can't just continue on our merry way and assume success, otherwise data will get silently corrupted.

The code path to supress the reporting of an error when auto-sense shows no
ASC/ASCQ was added for a reason. That reason has likely been lost to time,
but I worry about devices that are out there that rely on the current
behavior to function properly....

My original comment was that that code should be removed, but this is incorrect. In fact that code path is unrelated to this problem since it only executes if no transport error was detected. This code path is needed since retrieving sense data is done for multiple reasons other than a transport failure. For one, "If we're running the CB transport, which is incapable of determining status on its own, we will auto-sense unless the operation involved a data-in transfer." In this case, for a successful transfer the status must be reset to good after getting the sense data.

In the case in question here, the BOT transport reports a failure, and we retrieve sense data, but the sense data doesn't indicate an error. This results in the failure essentially being ignored. In this case I think we should be doing the same thing as we do on detecting an underflow:

srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/