Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors
From: Doug Anderson
Date: Thu Mar 24 2016 - 12:06:59 EST
Russell,
On Thu, Mar 24, 2016 at 8:30 AM, Russell King - ARM Linux
<linux@xxxxxxxxxxxxxxxx> wrote:
> On Thu, Mar 24, 2016 at 12:26:43PM +0100, Enric Balletbo Serra wrote:
>> I just saw that Russell introduced a patch [1] that will land on 4.6.
>> I think that patch solves the same issue that we're trying to fix, but
>> for sdhci controller.
>
> It doesn't sound like the same issue to me, though it was a long while
> back when I was looking at sdhci, so I may be misremembering.
>
>> The problem that we have on peach-pi, with our patch applied, is that
>> when we get a response CRC error on a command and we move to start
>> sending data, the transfer doesn't receives a timeout interrupt (I
>> don't know why). As I told, on rockchip works due the DTO quirk.
>> exynos is not using this quirk. Also, please correct me if I'm wrong,
>> looks like the sdhci controller has a timer to signal the command
>> timed out.
>
> From what I remember, the problem I was seeing is that SDHCI sends a
> command (iirc, a tuning command), and receives a response CRC error.
> The card, however, knows nothing about the CRC error, so it moves into
> the transfer state.
>
> Meanwhile, SDHCI stopped processing the command, resetting the SDHCI
> controller and reporting the error to the upper layers.
>
> Then, a new command gets queued, issued to the card, and this fails
> because the card is still in transfer state. This totally screws up
> the SDHCI UHS tuning.
>
> This is not the only SDHCI UHS tuning bug: others exist which do not
> yet have patches, where we can get spurious false positives/false
> negatives for various tuning steps which totally confuse the code.
>
> From what you say above, your issue is that you get a response CRC
> error, but the dw MMC masks the data side, which sounds like a
> different solution is needed.
What I was seeing that when the controller saw the CRC error it tried
to abort with a "stop" command. You can see the
"send_stop_abort(host, data)" in dw_mmc.c. Then I saw:
> Sending the stop command after the "response CRC error" would
> then throw the system into a confused state causing all future
> tuning phases to report failure.
Presumably this is similar to what you saw: the host saw the CRC error
but the card knew nothing about it. Sending the stop command during
this time confused the card. Presumably the card was in transfer
state during this time?
-Doug