RE: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for BlueField-3 SoC

From: Liming Sun
Date: Tue Dec 19 2023 - 16:18:46 EST




> -----Original Message-----
> From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> Sent: Monday, December 11, 2023 6:39 AM
> To: Liming Sun <limings@xxxxxxxxxx>; Christian Loehle
> <christian.loehle@xxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; David
> Thompson <davthompson@xxxxxxxxxx>
> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
> BlueField-3 SoC
>
> On 30/11/23 15:19, Liming Sun wrote:
> >
> >
> >> -----Original Message-----
> >> From: Christian Loehle <christian.loehle@xxxxxxx>
> >> Sent: Monday, November 27, 2023 8:36 AM
> >> To: Liming Sun <limings@xxxxxxxxxx>; Adrian Hunter
> >> <adrian.hunter@xxxxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; David
> >> Thompson <davthompson@xxxxxxxxxx>
> >> Cc: linux-mmc@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk
> for
> >> BlueField-3 SoC
> >>
> >> On 18/11/2023 13:46, Liming Sun wrote:
> >>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> >>> intermittent eMMC timeout issue reported on some cards under eMMC
> >>> stress test.
> >>>
> >>> Reported error message:
> >>> dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
> >>>
> >>> Signed-off-by: Liming Sun <limings@xxxxxxxxxx>
> >>> ---
> >>> drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
> >>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
> >> b/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> index 3a3bae6948a8..3c8fe8aec558 100644
> >>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
> >> sdhci_dwcmshc_pdata = {
> >>> #ifdef CONFIG_ACPI
> >>> static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
> >>> .ops = &sdhci_dwcmshc_ops,
> >>> - .quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> >>> + .quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> >>> + SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
> >>> .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
> >>> SDHCI_QUIRK2_ACMD23_BROKEN,
> >>> };
> >>
> >> __mmc_blk_ioctl_cmd: data error ?
> >> What stresstest are you running that issues ioctl commands?
> >> On which commands does the timeout occur?
> >> Anyway you should be able to increase the timeout in ioctl structure
> >> directly, i.e. in userspace, or does that not work?
> >
> > It's running stress test with tool like "fio --name=randrw_stress_round_1 --
> ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --
> norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --
> iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --
> bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K
> /3:76K/3 --filename=/dev/mmcblk0"
> > The tool(application) is owned by user or with some standard tool.
>
> fio does not send mmc ioctls, so I am also a bit confused about
> how you get "__mmc_blk_ioctl_cmd: data error -110" ?

There are other activities or background task going on. I assume it's other
MMC access which are affected by the stress FIO and got timeout. Would it make sense?