Re: [PATCH] mmc: dw_mmc: Wait for data transfer after response errors

From: Russell King - ARM Linux
Date: Wed Mar 30 2016 - 13:26:29 EST

On Wed, Mar 30, 2016 at 07:16:18PM +0200, Enric Balletbo Serra wrote:
> 2016-03-24 17:22 GMT+01:00 Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx>:
> > On Thu, Mar 24, 2016 at 09:06:45AM -0700, Doug Anderson wrote:
> >> Russell,
> > ...
> >> Presumably this is similar to what you saw: the host saw the CRC error
> >> but the card knew nothing about it. Sending the stop command during
> >> this time confused the card. Presumably the card was in transfer
> >> state during this time?
> >
> > If the card was in transfer state for a command which expects a stop
> > command, and that stop command was issued after the card entered
> > the transfer state, then I'd expect the card to handle it... though
> > there's always the firmware bug issue.
> >
> > If the card hadn't entered transfer state at the time the stop command
> > was issued.. I think that's more likely to hit card firmware issues.
> >
> > With the tuning commands, there's another case you can hit though:
> > the data transfer may have completed before you get around to sending
> > the stop command.
> >
> > That's why, for sdhci, I came to the conclusion that waiting for the
> > data transfer to complete or timeout was the best solution for SDHCI.
> >
> In fact I only saw the problem with dw_mmc-exynos, on dw_mmc-rockchip
> it doesn't happen because it enables the DW_MCI_QUIRK_BROKEN_DTO
> behaviour. What does this is use a kernel timer to signal when DTO
> interrupt does NOT come. Note that if I disable this quirk I can also
> saw the problem on rockchip.
> > Maybe, if sending a STOP command does cause card firmware issues, then:
> >
> > 1) it provides evidence that trying to send a stop command on response
> > CRC error is the wrong thing to do (it was talked about making SDHCI
> > do this.)
> >
> Seems the same here, so guess is the wrong thing to do.
> > 2) it suggests that the solution I came up with for SDHCI is the better
> > solution, rather than trying to immediately recover the situation by
> > sending a STOP command.
> >
> I'm wondering if just enable this quirk on exynos too is the proper
> solution. Unfortunately I don't have enough documentation to check
> differences between those controllers.
> Also will really help have access to some hardware that uses
> dw_mmc-pltfm to check if, like on exynos, same issue is triggered.
> Anyone with the hardware who can do some tests?

I'd really suggest that the dw-mmc folk place a moritorium on quirk
flags, and instead deal with situations like this without resorting
to this kind of thing.

sdhci is a good example why the quirk flag approach is totally wrong,
and shows that it leads to an unmaintainable mess. If dw-mmc people
don't want the driver to decend into the same state that sdhci is,
then things like this should not be quirks. sdhci already has a
long-term moritorium on quirk flags until the resulting mess has been
cleaned up.

The danger that quirk flags cause is also highlighted in your mail:
it's very likely that this _isn't_ a host controller issue at all,
but a MMC protocol issue or a card issue - and the behaviour required
here is not specific to any particular host controller. The problem
with having a quirk flag for it is that you end up with some hosts
enabling it, and other hosts having it disabled only because they
haven't yet tripped over the issue.

