Re: [PATCH 3/3] arm64: dts: rockchip: Disable DCMDs on RK3399's eMMC controller.ãèææïéäçlinux-mmc-owner@xxxxxxxxxxxxxxxäåã

From: Christoph MÃllner
Date: Sat Mar 02 2019 - 03:29:40 EST


Hi Shawn,

On 3/2/19 1:47 AM, Shawn Lin wrote:
> On 2019/3/2 0:43, Christoph Muellner wrote:
>> When using direct commands (DCMDs) on an RK3399, we get spurious
>> CQE completion interrupts for the DCMD transaction slot (#31):
>
> I didn't see it. Do you try any newer code, for instance, linux-next?

I can reproduce this with all kernel versions from 4.16 up to
linus/master. So all kernels with the cqhci driver (has been merged for
4.15) are affected.

All I need to do to reproduce the issue is to boot the system with a
root file system on the eMMC. I use a Debian stable based rootfs.

>
>>
>> [Â 931.196520] ------------[ cut here ]------------
>> [Â 931.201702] mmc1: cqhci: spurious TCN for tag 31
>> [Â 931.206906] WARNING: CPU: 0 PID: 1433 at
>> /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490
>> [Â 931.206909] Modules linked in:
>> [Â 931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted
>> 4.19.8-rt6-funkadelic #1
>> [Â 931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT)
>> [Â 931.206924] pstate: 40000005 (nZcv daif -PAN -UAO)
>> [Â 931.206927] pc : cqhci_irq+0x2e4/0x490
>> [Â 931.206931] lr : cqhci_irq+0x2e4/0x490
>> [Â 931.206933] sp : ffff00000e54bc80
>> [Â 931.206934] x29: ffff00000e54bc80 x28: 0000000000000000
>> [Â 931.206939] x27: 0000000000000001 x26: ffff000008f217e8
>> [Â 931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0
>> [Â 931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000
>> [Â 931.206953] x21: 0000000000000002 x20: 000000000000001f
>> [Â 931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff
>> [Â 931.206961] x17: 0000000000000000 x16: 0000000000000000
>> [Â 931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720
>> [Â 931.206970] x13: 0720072007200720 x12: 0720072007200720
>> [Â 931.206975] x11: 0720072007200720 x10: 0720072007200720
>> [Â 931.206980] x9 : 0720072007200720 x8 : 0720072007200720
>> [Â 931.206984] x7 : 0720073107330720 x6 : 00000000000005a0
>> [Â 931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000
>> [Â 931.206993] x3 : 0000000000000001 x2 : 0000000000000001
>> [Â 931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000
>> [Â 931.207001] Call trace:
>> [Â 931.207005]Â cqhci_irq+0x2e4/0x490
>> [Â 931.207009]Â sdhci_arasan_cqhci_irq+0x5c/0x90
>> [Â 931.207013]Â sdhci_irq+0x98/0x930
>> [Â 931.207019]Â irq_forced_thread_fn+0x2c/0xa0
>> [Â 931.207023]Â irq_thread+0x114/0x1c0
>> [Â 931.207027]Â kthread+0x128/0x130
>> [Â 931.207032]Â ret_from_fork+0x10/0x20
>> [Â 931.207035] ---[ end trace 0000000000000002 ]---
>>
>> The driver shows this message only for the first spurious interrupt
>> by using WARN_ONCE(). Changing this to WARN() shows, that this is
>> happening quite frequently (up to once a second).
>>
>> Since the eMMC 5.1 specification, where CQE and CQHCI are specified,
>> does not mention that spurious TCN interrupts for DCMDs can be simply
>> ignored, we must assume that using this feature is not working reliably.
>>
>> The current implementation uses DCMD for REQ_OP_FLUSH only, and
>> I could not see any performance/power impact when disabling
>> this optional feature for RK3399.
>>
>> Therefore this patch disables DCMDs for RK3399.
>
> We need to sort out the problem, and see if it could be solved, or
> we just simply remove MMC_CAP2_CQE_DCMD it from sdhci-of-arasan

I fully agree that we should address it in the driver
if it would be buggy.

Therefore I debugged the issue and used an event-log
based on atomic_t variables to observe what is going on.
And it is indeed the case that we get a second spurious
interrupt (an interrupt for a slot, which has the doorbell
bit not set previously) from the controller every now and then.
Only slot #31 is affected (so only DCMDs).
And only if DCMD support is enabled.

I disagree, that we should disable it for sdhci-of-arasan (i.e. for all
Arasan eMMC 5.1 based controllers), because, I cannot say that all
Arasan eMMC 5.1 based implementations are affected.
I only know that the one in the RK3399 is affected (mainly because I
don't have access to more devices with this IP core). Therefore the
series disables it for RK3399.

Thanks,
Christoph


>
>>
>> Signed-off-by: Christoph Muellner
>> <christoph.muellner@xxxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Philipp Tomsich <philipp.tomsich@xxxxxxxxxxxxxxxxxxxxx>
>> ---
>> Â arch/arm64/boot/dts/rockchip/rk3399.dtsi | 1 +
>> Â 1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> index 6cc1c9fa4ea6..1bbf0da4e01d 100644
>> --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
>> @@ -333,6 +333,7 @@
>> ÂÂÂÂÂÂÂÂÂ phys = <&emmc_phy>;
>> ÂÂÂÂÂÂÂÂÂ phy-names = "phy_arasan";
>> ÂÂÂÂÂÂÂÂÂ power-domains = <&power RK3399_PD_EMMC>;
>> +ÂÂÂÂÂÂÂ disable-cqe-dcmd;
>> ÂÂÂÂÂÂÂÂÂ status = "disabled";
>> ÂÂÂÂÂ };
>> Â
>
>