Re: BUG in mmc: core: Disable card detect during shutdown

From: Ulf Hansson
Date: Fri Jun 03 2022 - 06:47:24 EST


On Mon, 30 May 2022 at 18:55, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
>
> Hi Ulf,
> users did report a strange issue that the OMAP5 based Pyra does not
> shutdown if a kernel 5.10.116 is used.
>
> Someone did a bisect and found that reverting
>
> 0d66b395210c5084c2b7324945062c1d1f95487a
>
> resp. upstream
>
> 66c915d09b942fb3b2b0cb2f56562180901fba17
>
> solves it.
>
> I could now confirm that it also happens with v5.18.0.
> But interestingly only on the Pyra handheld device and not
> on the omap5evm (which is supported by mainline).
>
> The symptom is:
>
> a) without revert
>
> root@letux:~# poweroff
>
> Broadcast message from root@letux (console) (Sat Jan 1 01:08:25 2000):
>
> The system is going down for system halt NOW!
> INIT: Sending processes the TERM signal
> root@letux:~# [info] Using makefile-style concurrent boot in runlevel 0.
> [....] Stopping cgroup management proxy daemon: cgproxy[....] Stopping cgroup management daemon: cgmanager[....] Stop[ ok bluetooth: /usr/sbin/bluetoothd.
> [FAIL] Stopping ISC DHCP server: dhcpd failed!
> dhcpcd[3055]: sending signal 15 to pid 2976
> dhcpcd[3055]: waiting for pid 2976 to exit
> [ ok ] Shutting down ALSA...done.
> [ ok ] Asking all remaining processes to terminate...done.
> [ ok ] All processes ended within 2 seconds...done.
> [ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
> [ ok ....] Deconfiguring network interfaces...done.
> ^[[c[info] Saving the system clock.
> [info] Hardware Clock updated to Sat Jan 1 01:08:30 UTC 2000.
> [ ok ] Deactivating swap...done.
> ^[[c[ 77.289332] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
> [info] Will now halt.
>
> b) with reverting your patch
>
> root@letux:~# uname -a
> Linux letux 5.18.0-letux-lpae+ #9678 SMP PREEMPT Mon May 30 18:02:28 CEST 2022 armv7l GNU/Linux
> root@letux:~# poweroff
>
> Broadcast message from root@letux (console) (Sat Jan 1 01:39:15 2000):
>
> The system is going down for system halt NOW!
> INIT: Sending processes the TERM signal
> root@letux:~# [info] Using makefile-style concurrent boot in runlevel 0.
> [FAIL] Stopping cgroup management proxy daemon: cgproxy[....] Stopping ISC DHCP server: dhcpd failed!
> [....] Stopping cgroup management daemon: cgmanagerdhcpcd[3100]: sending signal 15 to pid 3013
> dhcpcd[3100]: waiting for pid 3013 to exit
> [ ok ] Stopping bluetooth: /usr/sbin/bluetoothd.
> [ ok ] Shutting down ALSA...done.
> [ ok ] Asking all remaining processes to terminate...done.
> [ ok ] All processes ended within 3 seconds...done.
> [ ok [[c[....] Stopping enhanced syslogd: rsyslogd.
> [ ok ....] Deconfiguring network interfaces...done.
> ^[[c[info] Saving the system clock.
> [info] Hardware Clock updated to Sat Jan 1 01:39:21 UTC 2000.
> [ ok ] Deactivating swap...done.
> ^[[c[ 44.563256] EXT4-fs (mmcblk0p2): re-mounted. Quota mode: none.
> [info] Will now halt.
> [ 46.917534] reboot: Power down
>
>
> What I suspect is that we have multiple mmc interfaces and have
> card detect wired up in the Pyra while it is ignored in the
> EVM. Is it possible that __mmc_stop_host() never returns in
> .shutdown_pre if card detect is set up (and potentially
> shut down earlier)?
>
> Setup of mmc is done in omap5-board-common.dtsi and omap5.dtsi.
>
> Out Pyra has a non-upstream device tree where we use
> omap5-board-common.dtsi and overwrite it by e.g.
>
> &mmc4 { /* second (u)SD slot (SDIO capable) */
> status = "okay";
> vmmc-supply = <&ldo2_reg>;
> pinctrl-names = "default";
> pinctrl-0 = <&mmc4_pins>;
> bus-width = <4>;
> cd-gpios = <&gpio3 13 GPIO_ACTIVE_LOW>; /* gpio3_77 */
> wp-gpios = <&gpio3 15 GPIO_ACTIVE_HIGH>; /* gpio3_79 */
> };
>
> But I have tried to remove the cd-gpois and wp-gpois. Or make the
> mmc interface being disabled (but I may not have catched everything
> in first place).
>
> Then I added some printk to mmc_stop_host() and __mmc_stop_host().
>
> mmc_stop_host() is not called but __mmc_stop_host() is called 4 times.
> There are 4 active MMC interfaces in the Pyra - 3 for (µ)SD slots
> and one for an SDIO WLAN module.
>
> Now it looks as if 3 of them are properly teared down (two of them
> seem to have host->slot.cd_irq >= 0) but on the fourth call
> cancel_delayed_work_sync(&host->detect); does not return. This is
> likely the location of the stall why we don't see a "reboot: Power down"
>
> Any ideas?

I guess the call to cancel_delayed_work_sync() in __mmc_stop_host()
hangs for one of the mmc hosts. This shouldn't happen - and indicates
that there is something else being wrong.

See more suggestions below.

>
> BR and thanks,
> Nikolaus
>
> printk hack:
>
> void __mmc_stop_host(struct mmc_host *host)
> {
> printk("%s 1\n", __func__);
> if (host->slot.cd_irq >= 0) {
> printk("%s 2\n", __func__);
> mmc_gpio_set_cd_wake(host, false);
> printk("%s 3\n", __func__);
> disable_irq(host->slot.cd_irq);
> printk("%s 4\n", __func__);
> }
>
> host->rescan_disable = 1;
> printk("%s 5\n", __func__);

My guess is that it's the same mmc host that causes the hang. I
suggest you print the name of the host too, to verify that. Something
along the lines of the below.

printk("%s: %s 5\n", mmc_hostname(host), __func__);

> cancel_delayed_work_sync(&host->detect);
> printk("%s 6\n", __func__);

Ditto.

> }
>
> resulting log:
>
> [info] Will now halt.
> [ 282.780929] __mmc_stop_host 1
> [ 282.784276] __mmc_stop_host 2
> [ 282.787735] __mmc_stop_host 3
> [ 282.791030] __mmc_stop_host 4
> [ 282.794235] __mmc_stop_host 5
> [ 282.797369] __mmc_stop_host 6
> [ 282.800918] __mmc_stop_host 1
> [ 282.804269] __mmc_stop_host 5
> [ 282.807541] __mmc_stop_host 6
> [ 282.810715] __mmc_stop_host 1
> [ 282.813842] __mmc_stop_host 2
> [ 282.816984] __mmc_stop_host 3
> [ 282.820175] __mmc_stop_host 4
> [ 282.823302] __mmc_stop_host 5
> [ 282.826449] __mmc_stop_host 6
> [ 282.830941] __mmc_stop_host 1
> [ 282.834076] __mmc_stop_host 5
>
> --- here should be another __mmc_stop_host 6
> --- and reboot: Power down

When/if you figured out that it's the same host that hangs, you could
try to disable that host through the DTS files (add status =
"disabled" in the device node, for example) - and see if that works.

Kind regards
Uffe