Re: [PATCH V15 06/22] mmc: block: Add blk-mq support

From: Dmitry Osipenko
Date: Wed Feb 21 2018 - 15:50:27 EST


On 29.11.2017 16:41, Adrian Hunter wrote:
> Define and use a blk-mq queue. Discards and flushes are processed
> synchronously, but reads and writes asynchronously. In order to support
> slow DMA unmapping, DMA unmapping is not done until after the next request
> is started. That means the request is not completed until then. If there is
> no next request then the completion is done by queued work.

Hello,

I'm using (running linux-next and doing some upstream development for) some old
NVIDIA Tegra tablet that has built-in (internal) and external MMC's and with the
blk-mq being enabled I'm observing a soft lockup. The lockup seems is
reproducible quite reliably by running fsck on any MMC partition, sometimes
kernels lockups on boot during probing partitions table (weirdly only when both
SDHCI's are present, i.e. internal storage enabled in DT and external SD is
inserted/enabled) and it also lockups pretty quickly in a case of just a general
use. Reverting mmc/ commits up to 1bec43a3b18 ("Remove option not to use
blk-mq") and disabling CONFIG_MMC_MQ_DEFAULT makes everything working fine
again. There is also a third SDHCI populated with built-in WiFi/Bluetooth SDIO
and I'm observing odd MMC timeouts with the blk-mq enabled, disabling
CONFIG_MMC_MQ_DEFAULT fixes these timeouts as well.

Any thoughts?

WiFi issue
========================

[ 38.247006] mmc2: Timeout waiting for hardware interrupt.
[ 38.247027] brcmfmac: brcmf_escan_timeout: timer expired
[ 38.247036] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 38.247047] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000001
[ 38.247055] mmc2: sdhci: Blk size: 0x00007008 | Blk cnt: 0x00000000
[ 38.247062] mmc2: sdhci: Argument: 0x21000008 | Trn mode: 0x00000013
[ 38.247070] mmc2: sdhci: Present: 0x01d70000 | Host ctl: 0x00000013
[ 38.247077] mmc2: sdhci: Power: 0x00000001 | Blk gap: 0x00000000
[ 38.247084] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 38.247091] mmc2: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000
[ 38.247098] mmc2: sdhci: Int enab: 0x02ff000b | Sig enab: 0x02fc000b
[ 38.247105] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[ 38.247112] mmc2: sdhci: Caps: 0x61ff30b0 | Caps_1: 0x00000000
[ 38.247119] mmc2: sdhci: Cmd: 0x0000353a | Max curr: 0x00000001
[ 38.247126] mmc2: sdhci: Resp[0]: 0x00001800 | Resp[1]: 0x08002db5
[ 38.247133] mmc2: sdhci: Resp[2]: 0x16da8000 | Resp[3]: 0x00000400
[ 38.247139] mmc2: sdhci: Host ctl2: 0x00000000
[ 38.247146] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x17c47200
[ 38.247152] mmc2: sdhci: ============================================
[ 38.247250] brcmfmac: brcmf_sdio_readframes: read 520 bytes from channel 1
failed: -84
[ 38.247274] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[ 40.807019] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 40.807042] brcmfmac: brcmf_notify_escan_complete: Scan abort failed
[ 48.487007] mmc2: Timeout waiting for hardware interrupt.
[ 48.487057] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 48.487096] mmc2: sdhci: Sys addr: 0x00000000 | Version: 0x00000001
[ 48.487128] mmc2: sdhci: Blk size: 0x00007040 | Blk cnt: 0x00000001
[ 48.487160] mmc2: sdhci: Argument: 0x21000040 | Trn mode: 0x00000013
[ 48.487191] mmc2: sdhci: Present: 0x01d70000 | Host ctl: 0x00000013
[ 48.487221] mmc2: sdhci: Power: 0x00000001 | Blk gap: 0x00000000
[ 48.487251] mmc2: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 48.487281] mmc2: sdhci: Timeout: 0x0000000e | Int stat: 0x00000000
[ 48.487313] mmc2: sdhci: Int enab: 0x02ff000b | Sig enab: 0x02fc000b
[ 48.487343] mmc2: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[ 48.487374] mmc2: sdhci: Caps: 0x61ff30b0 | Caps_1: 0x00000000
[ 48.487404] mmc2: sdhci: Cmd: 0x0000353a | Max curr: 0x00000001
[ 48.487435] mmc2: sdhci: Resp[0]: 0x00001000 | Resp[1]: 0x08002db5
[ 48.487466] mmc2: sdhci: Resp[2]: 0x16da8000 | Resp[3]: 0x00000400
[ 48.487493] mmc2: sdhci: Host ctl2: 0x00000000
[ 48.487525] mmc2: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x17c47200
[ 48.487552] mmc2: sdhci: ============================================
[ 48.487749] brcmfmac: brcmf_sdio_readframes: read 480 bytes from channel 1
failed: -84
[ 48.487822] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK


Soft lockup issue
========================

# fsck -f /dev/disk/by-uuid/6768309f-3545-49d5-9ac7-d5be24d35ef2
fsck ÐÐ util-linux 2.30.2
e2fsck 1.43.9 (8-Feb-2018)
ÐÑÐÑÐÐ 1: ÐÑÐÐÐÑÐÐ inodes, ÐÐÐÐs, Ð ÑÐÐÐÐ ÑÐÐÐÐÑÐÐ
Pass 2: Checking ÐÐÑÐÐÐÐ structure
Pass 3: Checking ÐÐÑÐÐÐÐ connectivity
Pass 4: Checking reference counts
Pass 5: Checking ÐÑÑÐÐÐ summary information
....

[ 125.925436] INFO: task kworker/0:3H:263 blocked for more than 60 seconds.
[ 125.925496] Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[ 125.925530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 125.925562] kworker/0:3H D 0 263 2 0x00000000
[ 125.925653] Workqueue: kblockd mmc_blk_mq_complete_work
[ 125.925747] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[ 125.925805] [<c0b998f0>] (schedule) from [<c086c870>]
(__mmc_claim_host+0xdc/0x24c)
[ 125.925849] [<c086c870>] (__mmc_claim_host) from [<c08750bc>]
(mmc_start_bkops+0x4c/0x190)
[ 125.925895] [<c08750bc>] (mmc_start_bkops) from [<c087d254>]
(mmc_blk_urgent_bkops+0x48/0x5c)
[ 125.925945] [<c087d254>] (mmc_blk_urgent_bkops) from [<c087f2e0>]
(mmc_blk_mq_complete_prev_req.part.5+0x74/0x210)
[ 125.925995] [<c087f2e0>] (mmc_blk_mq_complete_prev_req.part.5) from
[<c08813d0>] (mmc_blk_mq_complete_work+0x30/0x34)
[ 125.926049] [<c08813d0>] (mmc_blk_mq_complete_work) from [<c014306c>]
(process_one_work+0x1f8/0x584)
[ 125.926093] [<c014306c>] (process_one_work) from [<c014402c>]
(worker_thread+0x68/0x5d4)
[ 125.926144] [<c014402c>] (worker_thread) from [<c014965c>] (kthread+0x178/0x184)
[ 125.926188] [<c014965c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[ 125.926209] Exception stack(0xd579ffb0 to 0xd579fff8)
[ 125.926239] ffa0: 00000000 00000000
00000000 00000000
[ 125.926272] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[ 125.926301] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 125.926361] INFO: task fsck.ext4:471 blocked for more than 60 seconds.
[ 125.926399] Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[ 125.926427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 125.926455] fsck.ext4 D 0 471 470 0x00000000
[ 125.926536] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[ 125.926593] [<c0b998f0>] (schedule) from [<c015823c>] (io_schedule+0x20/0x40)
[ 125.926654] [<c015823c>] (io_schedule) from [<c0220acc>]
(wait_on_page_bit+0x120/0x144)
[ 125.926705] [<c0220acc>] (wait_on_page_bit) from [<c0220bc4>]
(__filemap_fdatawait_range+0xd4/0x114)
[ 125.926757] [<c0220bc4>] (__filemap_fdatawait_range) from [<c0223784>]
(file_write_and_wait_range+0x98/0xb4)
[ 125.926810] [<c0223784>] (file_write_and_wait_range) from [<c02c0300>]
(blkdev_fsync+0x2c/0x5c)
[ 125.926867] [<c02c0300>] (blkdev_fsync) from [<c02b7ce8>]
(vfs_fsync_range+0x4c/0xb0)
[ 125.926912] [<c02b7ce8>] (vfs_fsync_range) from [<c02b7dd4>] (do_fsync+0x4c/0x74)
[ 125.926954] [<c02b7dd4>] (do_fsync) from [<c02b80a0>] (SyS_fsync+0x1c/0x20)
[ 125.926996] [<c02b80a0>] (SyS_fsync) from [<c0101000>]
(ret_fast_syscall+0x0/0x54)
[ 125.927016] Exception stack(0xd5631fa8 to 0xd5631ff0)
[ 125.927050] 1fa0: 00480fc8 00481108 00000004 00481108
00000000 00000000
[ 125.927086] 1fc0: 00480fc8 00481108 00000000 00000076 00000000 7f2bb750
00483268 bec602e8
[ 125.927113] 1fe0: 00000076 bec60288 b6de5e8b b6d67cf6
[ 177.015618] random: crng init done
[ 187.365434] INFO: task kworker/0:3H:263 blocked for more than 60 seconds.
[ 187.365491] Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[ 187.365524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 187.365556] kworker/0:3H D 0 263 2 0x00000000
[ 187.365647] Workqueue: kblockd mmc_blk_mq_complete_work
[ 187.365741] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[ 187.365798] [<c0b998f0>] (schedule) from [<c086c870>]
(__mmc_claim_host+0xdc/0x24c)
[ 187.365842] [<c086c870>] (__mmc_claim_host) from [<c08750bc>]
(mmc_start_bkops+0x4c/0x190)
[ 187.365887] [<c08750bc>] (mmc_start_bkops) from [<c087d254>]
(mmc_blk_urgent_bkops+0x48/0x5c)
[ 187.365936] [<c087d254>] (mmc_blk_urgent_bkops) from [<c087f2e0>]
(mmc_blk_mq_complete_prev_req.part.5+0x74/0x210)
[ 187.365986] [<c087f2e0>] (mmc_blk_mq_complete_prev_req.part.5) from
[<c08813d0>] (mmc_blk_mq_complete_work+0x30/0x34)
[ 187.366039] [<c08813d0>] (mmc_blk_mq_complete_work) from [<c014306c>]
(process_one_work+0x1f8/0x584)
[ 187.366083] [<c014306c>] (process_one_work) from [<c014402c>]
(worker_thread+0x68/0x5d4)
[ 187.366134] [<c014402c>] (worker_thread) from [<c014965c>] (kthread+0x178/0x184)
[ 187.366178] [<c014965c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[ 187.366200] Exception stack(0xd579ffb0 to 0xd579fff8)
[ 187.366229] ffa0: 00000000 00000000
00000000 00000000
[ 187.366262] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[ 187.366291] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 187.366350] INFO: task fsck.ext4:471 blocked for more than 60 seconds.
[ 187.366388] Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[ 187.366416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 187.366445] fsck.ext4 D 0 471 470 0x00000000
[ 187.366526] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[ 187.366582] [<c0b998f0>] (schedule) from [<c015823c>] (io_schedule+0x20/0x40)
[ 187.366644] [<c015823c>] (io_schedule) from [<c0220acc>]
(wait_on_page_bit+0x120/0x144)
[ 187.366693] [<c0220acc>] (wait_on_page_bit) from [<c0220bc4>]
(__filemap_fdatawait_range+0xd4/0x114)
[ 187.366745] [<c0220bc4>] (__filemap_fdatawait_range) from [<c0223784>]
(file_write_and_wait_range+0x98/0xb4)
[ 187.366799] [<c0223784>] (file_write_and_wait_range) from [<c02c0300>]
(blkdev_fsync+0x2c/0x5c)
[ 187.366857] [<c02c0300>] (blkdev_fsync) from [<c02b7ce8>]
(vfs_fsync_range+0x4c/0xb0)
[ 187.366902] [<c02b7ce8>] (vfs_fsync_range) from [<c02b7dd4>] (do_fsync+0x4c/0x74)
[ 187.366944] [<c02b7dd4>] (do_fsync) from [<c02b80a0>] (SyS_fsync+0x1c/0x20)
[ 187.366987] [<c02b80a0>] (SyS_fsync) from [<c0101000>]
(ret_fast_syscall+0x0/0x54)
[ 187.367008] Exception stack(0xd5631fa8 to 0xd5631ff0)
[ 187.367041] 1fa0: 00480fc8 00481108 00000004 00481108
00000000 00000000
[ 187.367078] 1fc0: 00480fc8 00481108 00000000 00000076 00000000 7f2bb750
00483268 bec602e8
[ 187.367103] 1fe0: 00000076 bec60288 b6de5e8b b6d67cf6

--
Dmitry