Re: [PATCH] mmc: core: Check for timeout before checking mmc device state

From: Matt Bennett
Date: Mon May 11 2015 - 17:56:07 EST


Hi Uffe,

We are using the Octeon mmc host driver supplied from the Cavium SDK (I
don't believe it is released to upstream linux). We have both a mmc
flash memory device and an SD card reader attached to the mmc bus.

In the host driver code their is a mutex which must be obtained before
the driver can access the mmc bus. This stops the mmc flash and SD card
reader being written to in parallel (otherwise the signal on the bus
will be corrupted). It doesn't prevent parallel requests, it's just that
the second request will block on this mutex until the first request has
been completed.

In our specific case the following is occurring:

1. mmc_blk_part_switch() is called to switch partition on the mmc flash
device. This calls mmc_switch with a timeout_ms value of
'card->ext_csd.part_time' which is 10ms in this case.

2. In __mmc_switch() the command to switch partition is sent to the mmc
flash.

3. Between the command being sent to the flash and then the host polling
the status of the device (no busy detection hardware) a read or write
operation is begun on the SD card (in our case a Specification Version
2.00 card). In my testing I have seen the bus be blocked up to 800ms
while completing this operation.

4. The host polls the device for the status but blocks the first time on
the mutex for ~800ms while the SD card operation completes.

5. Finally the host gains the mutex and gets the status from the flash
device.

In my testing at this stage the status was never still
'R1_STATE_PRG' (it has been 800ms since the command was sent after all).
However the timeout check fails because it has been 800ms compared to
the original timeout_ms value passed in of 10ms. Therefore even though
the device has left the 'R1_STATE_PRG' state we return early with an
error that eventually gets printed to the log. This does not affect any
functionality as the host will simply try to switch the partition again
and if the bus does not block again then there are no issues.

By putting the timeout check before we read the status of the device
(and potentially block for longer than the timeout) we don't return an
early error if the device has indeed left the programming state. We
might as well continue through the function as after we return the error
the host is just going to issue the command again.

Please excuse me if I have missed something fundamental.

Thanks,
Matt



On Mon, 2015-05-11 at 12:00 +0200, Ulf Hansson wrote:
> On 8 May 2015 at 04:40, Matt Bennett <matt.bennett@xxxxxxxxxxxxxxxxxxx> wrote:
> > On a system that has multiple devices on the mmc bus the host can
> > block on the mutex that protects access to the bus. Some operations
>
> What mutex are you referring to?
>
> And why, exactly, does it prevent parallel requests on different cards
> (devices)?
>
> Kind regards
> Uffe
>
> > require the status of the device to be polled to see when the device
> > finishes executing the previous command that was sent to it (if
> > there is no busy detection in hardware). The current execution order
> > to check the status is:
> >
> > LOOP
> > {
> > 1. Send command to device to retrieve the status (this can block).
> > 2. Check we haven't exceeded the timeout value. If we have then
> > return an error.
> > 3. If the device is no longer in the program state then exit the
> > loop and continue through the function.
> > }
> >
> > If the send command blocks (and the timeout is exceeded) then the
> > function returns (and prints) an error even though the device has
> > likely left the programming state (due to the lengthy period of
> > time while the bus is blocked). By moving the timeout check before
> > retrieving the device status in the loop we better handle the case
> > where the mmc bus has been blocked but the device has left the
> > programming state.
> >
> > Signed-off-by: Matt Bennett <matt.bennett@xxxxxxxxxxxxxxxxxxx>
> > Cc: kuninori.morimoto.gx@xxxxxxxxxxx
> > Cc: jh80.chung@xxxxxxxxxxx
> > Cc: sboyd@xxxxxxxxxxxxxx
> > Cc: johan.rudholm@xxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > Cc: linux-mmc@xxxxxxxxxxxxxxx
> > ---
> > drivers/mmc/card/block.c | 22 +++++++++++-----------
> > drivers/mmc/core/core.c | 20 ++++++++++----------
> > drivers/mmc/core/mmc_ops.c | 14 +++++++-------
> > 3 files changed, 28 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
> > index 2c25271..0abefde 100644
> > --- a/drivers/mmc/card/block.c
> > +++ b/drivers/mmc/card/block.c
> > @@ -747,6 +747,17 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> > u32 status;
> >
> > do {
> > + /*
> > + * Timeout if the device never becomes ready for data and never
> > + * leaves the program state.
> > + */
> > + if (time_after(jiffies, timeout)) {
> > + pr_err("%s: Card stuck in programming state! %s %s\n",
> > + mmc_hostname(card->host),
> > + req->rq_disk->disk_name, __func__);
> > + return -ETIMEDOUT;
> > + }
> > +
> > err = get_card_status(card, &status, 5);
> > if (err) {
> > pr_err("%s: error %d requesting status\n",
> > @@ -766,17 +777,6 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms,
> > break;
> >
> > /*
> > - * Timeout if the device never becomes ready for data and never
> > - * leaves the program state.
> > - */
> > - if (time_after(jiffies, timeout)) {
> > - pr_err("%s: Card stuck in programming state! %s %s\n",
> > - mmc_hostname(card->host),
> > - req->rq_disk->disk_name, __func__);
> > - return -ETIMEDOUT;
> > - }
> > -
> > - /*
> > * Some cards mishandle the status bits,
> > * so make sure to check both the busy
> > * indication and the card state.
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index c296bc0..6e56cb3 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -2047,6 +2047,16 @@ static int mmc_do_erase(struct mmc_card *card, unsigned int from,
> >
> > timeout = jiffies + msecs_to_jiffies(MMC_CORE_TIMEOUT_MS);
> > do {
> > + /* Timeout if the device never becomes ready for data and
> > + * never leaves the program state.
> > + */
> > + if (time_after(jiffies, timeout)) {
> > + pr_err("%s: Card stuck in programming state! %s\n",
> > + mmc_hostname(card->host), __func__);
> > + err = -EIO;
> > + goto out;
> > + }
> > +
> > memset(&cmd, 0, sizeof(struct mmc_command));
> > cmd.opcode = MMC_SEND_STATUS;
> > cmd.arg = card->rca << 16;
> > @@ -2060,16 +2070,6 @@ static int mmc_do_erase(struct mmc_card *card, unsigned int from,
> > goto out;
> > }
> >
> > - /* Timeout if the device never becomes ready for data and
> > - * never leaves the program state.
> > - */
> > - if (time_after(jiffies, timeout)) {
> > - pr_err("%s: Card stuck in programming state! %s\n",
> > - mmc_hostname(card->host), __func__);
> > - err = -EIO;
> > - goto out;
> > - }
> > -
> > } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
> > (R1_CURRENT_STATE(cmd.resp[0]) == R1_STATE_PRG));
> > out:
> > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> > index 0ea042d..b30ed91 100644
> > --- a/drivers/mmc/core/mmc_ops.c
> > +++ b/drivers/mmc/core/mmc_ops.c
> > @@ -526,6 +526,13 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> > /* Must check status to be sure of no errors. */
> > timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > do {
> > + /* Timeout if the device never leaves the program state. */
> > + if (time_after(jiffies, timeout)) {
> > + pr_err("%s: Card stuck in programming state! %s\n",
> > + mmc_hostname(host), __func__);
> > + return -ETIMEDOUT;
> > + }
> > +
> > if (send_status) {
> > err = __mmc_send_status(card, &status, ignore_crc);
> > if (err)
> > @@ -545,13 +552,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> > mmc_delay(timeout_ms);
> > return 0;
> > }
> > -
> > - /* Timeout if the device never leaves the program state. */
> > - if (time_after(jiffies, timeout)) {
> > - pr_err("%s: Card stuck in programming state! %s\n",
> > - mmc_hostname(host), __func__);
> > - return -ETIMEDOUT;
> > - }
> > } while (R1_CURRENT_STATE(status) == R1_STATE_PRG);
> >
> > if (mmc_host_is_spi(host)) {
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html