Re: mmc: core: Disable card detect during shutdown

From: Tim Harvey
Date: Fri Mar 03 2023 - 18:38:37 EST


On Thu, Mar 2, 2023 at 2:37 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
>
> + Robert
>
> On Thu, 2 Mar 2023 at 00:32, Tim Harvey <tharvey@xxxxxxxxxxxxx> wrote:
> >
> > Greetings,
> >
> > I've encountered a hang on shutdown on octeontx (CN8030 SoC, THUNDERX
> > architecture) that I bisected to commit 66c915d09b94 ("mmc: core:
> > Disable card detect during shutdown").
> >
> > It looks like the OMP5 Pyra ran into this as well related to a
> > malfunctioning driver [1]
> >
> > In the case of MMC_CAVIUM_THUNDERX the host controller supports
> > multiple slots each having their own CMD signal but shared clk/data
> > via the following dt:
> >
> > mmc@1,4 {
> > compatible = "cavium,thunder-8890-mmc";
> > reg = <0xc00 0x00 0x00 0x00 0x00>;
> > #address-cells = <0x01>;
> > #size-cells = <0x00>;
> > clocks = <0x0b>;
> >
> > /* eMMC */
> > mmc-slot@0 {
> > compatible = "mmc-slot";
> > reg = <0>;
> > vmmc-supply = <&mmc_supply_3v3>;
> > max-frequency = <35000000>;
> > no-1-8-v;
> > bus-width = <8>;
> > no-sdio;
> > no-sd;
> > mmc-ddr-3_3v;
> > cap-mmc-highspeed;
> > };
> >
> > /* microSD */
> > mmc-slot@1 {
> > compatible = "mmc-slot";
> > reg = <1>;
> > vmmc-supply = <&mmc_supply_3v3>;
> > max-frequency = <35000000>;
> > no-1-8-v;
> > broken-cd;
> > bus-width = <4>;
> > cap-sd-highspeed;
> > };
> > };
> >
> > mmc_add_host is only called once for mmc0 and I can't see any printk
>
> That looks wrong. There needs to be one mmc host registered per slot,
> otherwise things will, for sure, not work.
>
> I suggest you have a closer look to see what goes on in thunder_mmc_probe().
>

Ulf,

Sorry, I was mistaken. Each slot does get its own mmc host.

I find that with thunderx_mmc I can reproduce this hang on shutdown
even if I just have a single slot with broken-cd defined.

I wonder if it has to do with thunder_mmc_probe getting called
multiple times because it defers due to gpio/regulator not yet being
available:
[ 6.846262] thunderx_mmc 0000:01:01.4: Adding to iommu group 1
[ 6.852143] thunder_mmc_probe
[ 6.855622] thunder_mmc_probe scanning slots
[ 6.860137] mmc_alloc_host: mmc0 init delayed work
[ 6.864938] cvm_mmc_of_slot_probe mmc0
[ 6.868695] cvm_mmc_of_slot_probe mmc0 Failed: EPROBE_DEFER
[ 6.874269] mmc_free_host: mmc0
[ 6.877481] thunder_mmc_probe Failed: EPROBE_DEFER
...
[ 7.737536] gpio_thunderx 0000:00:06.0: Adding to iommu group 16
[ 7.745252] gpio gpiochip0: (gpio_thunderx): not an immutable chip,
please consider fixing it!
[ 7.754096] gpio_thunderx 0000:00:06.0: ThunderX GPIO: 48 lines
with base 512.
...
[ 7.946636] thunder_mmc_probe
[ 7.950125] thunder_mmc_probe scanning slots
[ 7.954597] mmc_alloc_host: mmc0 init delayed work
[ 7.959399] cvm_mmc_of_slot_probe mmc0
[ 7.963158] cvm_mmc_of_slot_probe mmc0 Failed: EPROBE_DEFER
[ 7.968732] mmc_free_host: mmc0
[ 7.971963] thunder_mmc_probe Failed: EPROBE_DEFER
...
[ 7.998271] reg_fixed_voltage_probe
[ 8.001773] reg-fixed-voltage mmc_supply_3v3: reg_fixed_voltage_probe
[ 8.008360] reg-fixed-voltage mmc_supply_3v3: mmc_supply_3v3
supplying 3300000uV
[ 8.015851] thunder_mmc_probe
[ 8.019318] thunder_mmc_probe scanning slots
[ 8.023794] mmc_alloc_host: mmc0 init delayed work
[ 8.028596] cvm_mmc_of_slot_probe mmc0
[ 8.032488] mmc_add_host: mmc0
[ 8.060655] cvm_mmc_of_slot_probe mmc0 ok
[ 8.064678] thunderx_mmc 0000:01:01.4: probed
[ 8.069041] mmc_rescan: mmc0 irq=-22

> > debugging added to __mmc_stop_host (maybe because serial/console has
> > been disabled by that point?).
>
> The serial console should work fine at this point, at least on those
> systems that I have tested this code with.
>
> Perhaps you added the debug print too late in the function, if the
> calls to disable_irq() or cancel_delayed_work_sync() are hanging?
>

This was something to do with busybox reboot. I switched to using
sysrq (echo o > /proc/sysrq-trigger) to reboot and now I can see my
printk's

> >
> > It appears that what causes this hang is the 'broken-cd' which enables
> > the detect change polling on mmc1. I have the ability to flip the CMD
> > signal routing thus making mmc0 the microSD and mmc1 the eMMC and when
> > I do that there isn't an issue so I think what happens is in the case
> > where mmc polling is enabled on mmc1 but not mmc0 (as above) the
> > polling causes a hang after __mmc_stop_host() is called for mmc0.
>
> The code in __mmc_stop_host() has been tested for both polling and
> gpio card detections. That said, it looks to me that there is
> something weird going on in the cavium mmc driver.
>
> What makes this even tricker, is that it's uncommon and not
> recommended to use more than one mmc slot per host instance.
>

that was my mistake... there is one host instance per slot and I see
this even if I only have 1 slot as long as polling is enabled.

now that I can see my printk's I can confirm it hangs when
_mmc_stop_host calls the cancel_delayed_work_sync:
# echo o > /proc/sysrq-trigger
[ 210.370200] sysrq: Power Off
[ 210.373147] kernel_shutdown_prepare
[ 210.896927] mmc_rescan: mmc0 irq=-22
[ 213.038191] mmc_host_classdev_shutdown mmc0
[ 213.042384] __mmc_stop_host: mmc0 cd_irq=-22
[ 213.046658] __mmc_stop_host: mmc0 calling cancel_delayed_work_sync
^^^ never comes back

If I comment out the call to cancel_delayed_work_sync in
__mmc_stop_host then shutdown does not hang so I think it has
something to do with mmc_alloc_host setting up the polling multiple
times.

Best Regards,

Tim



> >
> > Any ideas?
>
> I hope the above thoughts can point you in a direction to narrow down
> this problem.
>
> >
> > Best Regards,
> >
> > Tim
> >
> > [1] https://lore.kernel.org/all/55A0788B-03E8-457E-B093-40FD93F1B9F3@xxxxxxxxxxxxx/
>
> Kind regards
> Uffe