Re: [PATCH] sdhci: fix the fake timeout bug

From: Du, Alek
Date: Sat Dec 01 2018 - 00:43:12 EST


On Fri, 30 Nov 2018 16:40:04 +0200
Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:

> So you are saying this only happens under virtualization?
>

Yes, I saw the issue under ACRN virtualization Service OS (4.19 kernel).
But theoretically it can happen in other case when scheduling is not
that good. (due to bad driver or other high priority task)


> >
> > Please look at the sdhci_enable_clk() below, there is a window in
> > clock stabilization check. The first check is to check status
> > register, the second check is to check if time passed. That's why I
> > can capture a case that after time passed, the actually clock
> > control register indicated that clock is stable. So the error
> > handling is wrong...
>
> Sure, but "Internal clock never stabilised." is not one of the
> errors you listed.

Sorry my bad not listing all the error log:

Case 1. clock stabilization timeout: (the below clock control dump shows clock is good)
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr: 0x00000000 | Version: 0x00001002
[159525.256277] mmc1: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000
[159525.256523] mmc1: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000
[159525.256752] mmc1: sdhci: Present: 0x1fff0000 | Host ctl: 0x00000000
[159525.256979] mmc1: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
[159525.257205] mmc1: sdhci: Wake-up: 0x00000000 | Clock: 0x0000fa03

Case 2. Reset timeout: (the same check window in sdhci_reset())
[ 7639.968613] mmc1: Reset 0x4 never completed.

Case 3. Hardware interrupt timeout
[ 1049.561728] mmc1: Timeout waiting for hardware interrupt.

>
> >
> > Also the sdhci_send_command commands is not in spin lock There is a
> > windows between mod_timer and real command send...
>
> What code path does not have a spin lock?
Ouch my bad, the sdhci_send_command are called either from spinlock or IRQ handler,
so this part is good ...

I'll send a new patch to cover case 1 and case 2 if you agree.

Thanks,
Alek