Re: [PATCH v2] mmc: dw_mmc: Disable SDIO interrupts while suspended to fix suspend/resume
From: Ulf Hansson
Date: Tue May 28 2019 - 15:25:42 EST
On Mon, 29 Apr 2019 at 22:41, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Processing SDIO interrupts while dw_mmc is suspended (or partly
> suspended) seems like a bad idea. We really don't want to be
> processing them until we've gotten ourselves fully powered up.
I fully agree.
Although, this is important not only from the host driver/controller
perspective, but also from the SDIO card (managed by mmc core) point
of view.
In $subject patch you mange the driver/controller issue, but only for
one specific host driver (dw_mmc). I am thinking that this problem may
be a rather common problem, so perhaps we should try to address this
from the core in a way that it affects all host drivers. Did you
consider that?
The other problem I refer to, is in principle a way to prevent
sdio_run_irqs() from being executed before the SDIO card has been
resumed, via mmc_sdio_resume(). It's a separate problem, but certainly
related. This may need some more thinking to address properly, let's
just keep this in mind and discuss this in a separate thread.
>
> You might be wondering how it's even possible to become suspended when
> an SDIO interrupt is active. As can be seen in
> dw_mci_enable_sdio_irq(), we explicitly keep dw_mmc out of runtime
> suspend when the SDIO interrupt is enabled. ...but even though we
> stop normal runtime suspend transitions when SDIO interrupts are
> enabled, the dw_mci_runtime_suspend() can still get called for a full
> system suspend.
>
> Let's handle all this by explicitly masking SDIO interrupts in the
> suspend call and unmasking them later in the resume call. To do this
> cleanly I'll keep track of whether the client requested that SDIO
> interrupts be enabled so that we can reliably restore them regardless
> of whether we're masking them for one reason or another.
>
> It should be noted that if dw_mci_enable_sdio_irq() is never called
> (for instance, if we don't have an SDIO card plugged in) that
> "client_sdio_enb" will always be false. In those cases this patch
> adds a tiny bit of overhead to suspend/resume (a spinlock and a
> read/write of INTMASK) but other than that is a no-op. The
> SDMMC_INT_SDIO bit should always be clear and clearing it again won't
> hurt.
Thanks for the detailed problem description. In general your approach
sounds okay to me, but I have a few questions.
1) As kind of stated above, did you consider a solution where the core
simply disables the SDIO IRQ in case it isn't enabled for system
wakeup? In this way all host drivers would benefit.
2) dw_mmc isn't calling device_init_wakeup() during ->probe(), hence I
assume it doesn't support the SDIO IRQ being configured as system
wakeup. Correct? I understand this is platform specific, but still it
would be good to know your view.
3) Because of 2) The below code in dw_mci_runtime_suspend(), puzzles me:
"if (host->slot->mmc->pm_flags & MMC_PM_KEEP_POWER)"
dw_mci_set_ios(host->slot->mmc, &host->slot->mmc->ios);
Why is 3) needed at all in case system wakeup isn't supported?
A note; the current support in the mmc core for the SDIO IRQ being
used as system wakeup, really needs some re-work. For example, we
should convert to use common wakeup interfaces, as to allow the PM
core to behave correctly during system suspend/resume. These are
changes that have been scheduled on my TODO list since long time ago,
I hope I can get some time to look into them soon.
>
> Without this fix it can be seen that rk3288-veyron Chromebooks with
> Marvell WiFi would sometimes fail to resume WiFi even after picking my
> recent mwifiex patch [1]. Specifically you'd see messages like this:
> mwifiex_sdio mmc1:0001:1: Firmware wakeup failed
> mwifiex_sdio mmc1:0001:1: PREP_CMD: FW in reset state
>
> ...and tracing through the resume code in the failing cases showed
> that we were processing a SDIO interrupt really early in the resume
> call.
>
> NOTE: downstream in Chrome OS 3.14 and 3.18 kernels (both of which
> support the Marvell SDIO WiFi card) we had a patch ("CHROMIUM: sdio:
> Defer SDIO interrupt handling until after resume") [2]. Presumably
> this is the same problem that was solved by that patch.
Seems reasonable.
>
> [1] https://lkml.kernel.org/r/20190404040106.40519-1-dianders@xxxxxxxxxxxx
> [2] https://crrev.com/c/230765
>
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
> I didn't put any "Fixes" tag here, but presumably this could be
> backported to whichever kernels folks found it useful for. I have at
> least confirmed that kernels v4.14 and v4.19 (as well as v5.1-rc2)
> show the problem. It is very easy to pick this to v4.19 and it
> definitely fixes the problem there.
>
> I haven't spent the time to pick this to 4.14 myself, but presumably
> it wouldn't be too hard to backport this as far as v4.13 since that
> contains commit 32dba73772f8 ("mmc: dw_mmc: Convert to use
> MMC_CAP2_SDIO_IRQ_NOTHREAD for SDIO IRQs"). Prior to that it might
> make sense for anyone experiencing this problem to just pick the old
> CHROMIUM patch to fix them.
>
> Changes in v2:
> - Suggested 4.14+ in the stable tag (Sasha-bot)
> - Extra note that this is a noop on non-SDIO (Shawn / Emil)
> - Make boolean logic cleaner as per https://crrev.com/c/1586207/1
> - Hopefully clear comments as per https://crrev.com/c/1586207/1
>
> drivers/mmc/host/dw_mmc.c | 27 +++++++++++++++++++++++----
> drivers/mmc/host/dw_mmc.h | 3 +++
> 2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index 80dc2fd6576c..480067b87a94 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1664,7 +1664,8 @@ static void dw_mci_init_card(struct mmc_host *mmc, struct mmc_card *card)
> }
> }
>
> -static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
> +static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, bool enb,
> + bool client_requested)
> {
> struct dw_mci *host = slot->host;
> unsigned long irqflags;
> @@ -1672,6 +1673,20 @@ static void __dw_mci_enable_sdio_irq(struct dw_mci_slot *slot, int enb)
>
> spin_lock_irqsave(&host->irq_lock, irqflags);
>
> + /*
> + * If we're being called directly from dw_mci_enable_sdio_irq()
> + * (which means that the client driver actually wants to enable or
> + * disable interrupts) then save the request. Otherwise this
> + * wasn't directly requested by the client and we should logically
> + * AND it with the client request since we want to disable if
> + * _either_ the client disabled OR we have some other reason to
> + * disable temporarily.
> + */
> + if (client_requested)
> + host->client_sdio_enb = enb;
> + else
> + enb &= host->client_sdio_enb;
> +
> /* Enable/disable Slot Specific SDIO interrupt */
> int_mask = mci_readl(host, INTMASK);
> if (enb)
> @@ -1688,7 +1703,7 @@ static void dw_mci_enable_sdio_irq(struct mmc_host *mmc, int enb)
> struct dw_mci_slot *slot = mmc_priv(mmc);
> struct dw_mci *host = slot->host;
>
> - __dw_mci_enable_sdio_irq(slot, enb);
> + __dw_mci_enable_sdio_irq(slot, enb, true);
>
> /* Avoid runtime suspending the device when SDIO IRQ is enabled */
> if (enb)
> @@ -1701,7 +1716,7 @@ static void dw_mci_ack_sdio_irq(struct mmc_host *mmc)
> {
> struct dw_mci_slot *slot = mmc_priv(mmc);
>
> - __dw_mci_enable_sdio_irq(slot, 1);
> + __dw_mci_enable_sdio_irq(slot, true, false);
> }
>
> static int dw_mci_execute_tuning(struct mmc_host *mmc, u32 opcode)
> @@ -2734,7 +2749,7 @@ static irqreturn_t dw_mci_interrupt(int irq, void *dev_id)
> if (pending & SDMMC_INT_SDIO(slot->sdio_id)) {
> mci_writel(host, RINTSTS,
> SDMMC_INT_SDIO(slot->sdio_id));
> - __dw_mci_enable_sdio_irq(slot, 0);
> + __dw_mci_enable_sdio_irq(slot, false, false);
> sdio_signal_irq(slot->mmc);
> }
>
> @@ -3424,6 +3439,8 @@ int dw_mci_runtime_suspend(struct device *dev)
> {
> struct dw_mci *host = dev_get_drvdata(dev);
>
> + __dw_mci_enable_sdio_irq(host->slot, false, false);
> +
> if (host->use_dma && host->dma_ops->exit)
> host->dma_ops->exit(host);
>
> @@ -3490,6 +3507,8 @@ int dw_mci_runtime_resume(struct device *dev)
> /* Now that slots are all setup, we can enable card detect */
> dw_mci_enable_cd(host);
>
> + __dw_mci_enable_sdio_irq(host->slot, true, false);
> +
> return 0;
>
> err:
> diff --git a/drivers/mmc/host/dw_mmc.h b/drivers/mmc/host/dw_mmc.h
> index 46e9f8ec5398..dfbace0f5043 100644
> --- a/drivers/mmc/host/dw_mmc.h
> +++ b/drivers/mmc/host/dw_mmc.h
> @@ -127,6 +127,7 @@ struct dw_mci_dma_slave {
> * @cmd11_timer: Timer for SD3.0 voltage switch over scheme.
> * @cto_timer: Timer for broken command transfer over scheme.
> * @dto_timer: Timer for broken data transfer over scheme.
> + * @client_sdio_enb: The value last passed to enable_sdio_irq.
> *
> * Locking
> * =======
> @@ -234,6 +235,8 @@ struct dw_mci {
> struct timer_list cmd11_timer;
> struct timer_list cto_timer;
> struct timer_list dto_timer;
> +
> + bool client_sdio_enb;
> };
>
> /* DMA ops for Internal/External DMAC interface */
> --
> 2.21.0.593.g511ec345e18-goog
>
Kind regards
Uffe