Re: [PATCH] mmc: Some Micron eMMC devices cause reboot to hang

From: Alan Cooper
Date: Thu Aug 13 2020 - 14:25:00 EST


On Wed, Aug 5, 2020 at 4:28 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
>
> On Mon, 27 Jul 2020 at 15:07, Alan Cooper <alcooperx@xxxxxxxxx> wrote:
> >
> > On Fri, Jul 24, 2020 at 7:03 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> > >
> > > On Tue, 21 Jul 2020 at 21:18, Al Cooper <alcooperx@xxxxxxxxx> wrote:
> > > >
> > > > When using eMMC as the boot device, some Micron eMMC devices will
> > > > cause reboot to hang. This is a result of the eMMC device not going
> > > > into boot mode after the hardware sends CMD0 to reset the eMMC
> > > > device. This only happens if the kernel driver sends CMD5 (SLEEP_WAKE),
> > > > to put the device into sleep state, before restarting the system.
> > >
> > > What do you mean by "boot mode"?
> >
> > I'm referring to the "Boot operation mode" described in Section 6.3 of
> > the JEDEC spec.
> > Our hardware will send a CMD0 with 0xf0f0f0f0 argument at powerup or
> > when the SoC is reset, and then hold the CLK signal low for 74 clock
> > cycles. This should put the eMMC device into boot mode where it
> > streams consecutive blocks without additional commands. With this
> > Micron device I find that if I send a CMD5 before trying to restart
> > the system by resetting the SoC, that the system hangs. I worked with
> > Micron on the issue and they finally said to either avoid sending the
> > CMD5 on restart or use a newer version of the Micron eMMC device.
>
> Thanks for clarifying the test sequence!
>
> However, I am still not (yet) convinced that a card quirk is the right
> thing to do. What does the eMMC spec say about sending a CMD0 with
> 0xf0f0f0f0 to a device that "sleeps"?

This is from the spec:
A Device may be switched between a Sleep state and a Standby state by
SLEEP/AWAKE (CMD5). In the
Sleep state the power consumption of the memory device is minimized.
In this state the memory device
reacts only to the commands RESET (CMD0 with argument of either
0x00000000 or 0xF0F0F0F0 or
H/W reset) and SLEEP/AWAKE (CMD5). All the other commands are ignored
by the memory device.


>
> Moreover, how does your mmc host driver (and platform) treat VCC and
> VCCQ at system suspend/resume, compared to when a reset is done? Is
> there a difference?

We don't change VCC and VCCQ during suspend.

>
> The point is, if the eMMC spec is being violated, we should not make a
> card quirk - as it may cause problems for other platforms.

The eMMC spec is being violated here by the Micron eMMC device.
Our hardware has worked this way for over 10 years and has never had a
problem with any of the eMMC devices from the major manufacturers
including older and newer Micron eMMC devices.
When I talked to Micron on the phone, one engineer said it was a
firmware bug, but since it was an older chip that it wouldn't be fixed
and that we should use a newer family of Micron eMMC devices.

>
> >
> >
> > >
> > > When the kernel sends the CMD0 to wake up the eMMC from sleep, at
> > > system resume for example, it all works fine, I guess. What is the
> > > difference?
> >
> > On system resume the hardware will not try to put the eMMC device back
> > into boot mode.
>
> I see.
>
> Does your host driver support HW busy signalling, so DAT0 is monitored
> for de-assertion to confirm the CMD5 is completed by the kernel - or
> do you rely on the per card sleep timeout to be used in mmc_sleep()?

We use the per card sleep delay.

>
> Additionally, I wonder about what options you have to reset the eMMC?
> Can we use something along the lines of
> drivers/mmc/core/pwrseq_emmc.c? If it's not possible to do a HW reset,
> we could try sending CMD0 with argument being '0' in the reset path.

The problem is that the system is rebooting and it's the hardware that
automatically sends the CMD0/0xf0f0f0f0 because there's no software
running yet. This can't be changed.

>
> What do you think?
>
> >
> > Al
> >
> > >
> > > > The fix is to add a quirk that avoids sending the SLEEP command
> > > > and to use MMC_FIXUP to set the quirk for these Micron devices.
> > >
> > > I am not sure this is Micron device specific, but rather some it's a
> > > driver/platform bug. Maybe on the kernel side or in the bootloader
> > > code.
> > >
>
> Kind regards
> Uffe
>
> > >
> > > >
> > > > Signed-off-by: Al Cooper <alcooperx@xxxxxxxxx>
> > > > ---
> > > > drivers/mmc/core/mmc.c | 3 ++-
> > > > drivers/mmc/core/quirks.h | 8 ++++++++
> > > > include/linux/mmc/card.h | 1 +
> > > > 3 files changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> > > > index 4203303f946a..4d69e8f8fe59 100644
> > > > --- a/drivers/mmc/core/mmc.c
> > > > +++ b/drivers/mmc/core/mmc.c
> > > > @@ -1895,7 +1895,8 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
> > > >
> > > > static int mmc_can_sleep(struct mmc_card *card)
> > > > {
> > > > - return (card && card->ext_csd.rev >= 3);
> > > > + return card && card->ext_csd.rev >= 3 &&
> > > > + ((card->quirks & MMC_QUIRK_BROKEN_SLEEP) == 0);
> > > > }
> > > >
> > > > static int mmc_sleep(struct mmc_host *host)
> > > > diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
> > > > index 472fa2fdcf13..7263187b6323 100644
> > > > --- a/drivers/mmc/core/quirks.h
> > > > +++ b/drivers/mmc/core/quirks.h
> > > > @@ -99,6 +99,14 @@ static const struct mmc_fixup mmc_blk_fixups[] = {
> > > > MMC_FIXUP("V10016", CID_MANFID_KINGSTON, CID_OEMID_ANY, add_quirk_mmc,
> > > > MMC_QUIRK_TRIM_BROKEN),
> > > >
> > > > + /*
> > > > + * Some Micron eMMC devices will not go into boot mode on
> > > > + * CMD0 arg: 0XF0F0F0F0 after going into SLEEP state.
> > > > + * This will hang a reboot.
> > > > + */
> > > > + MMC_FIXUP(CID_NAME_ANY, CID_MANFID_NUMONYX, 0x014e, add_quirk_mmc,
> > > > + MMC_QUIRK_BROKEN_SLEEP),
> > > > +
> > > > END_FIXUP
> > > > };
> > > >
> > > > diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
> > > > index 7d46411ffaa2..0cdddcb5e17d 100644
> > > > --- a/include/linux/mmc/card.h
> > > > +++ b/include/linux/mmc/card.h
> > > > @@ -270,6 +270,7 @@ struct mmc_card {
> > > > #define MMC_QUIRK_BROKEN_IRQ_POLLING (1<<11) /* Polling SDIO_CCCR_INTx could create a fake interrupt */
> > > > #define MMC_QUIRK_TRIM_BROKEN (1<<12) /* Skip trim */
> > > > #define MMC_QUIRK_BROKEN_HPI (1<<13) /* Disable broken HPI support */
> > > > +#define MMC_QUIRK_BROKEN_SLEEP (1<<14) /* Broken sleep mode */
> > > >
> > > > bool reenable_cmdq; /* Re-enable Command Queue */
> > > >
> > > > --
> > > > 2.17.1
> > > >