Re: [PATCH v1] mmc: core: check R1_STATUS for erase/trim/discard

From: Adrian Hunter
Date: Fri Apr 26 2024 - 02:17:16 EST


On 25/04/24 19:18, Ulf Hansson wrote:
> + Wolfram, Adrian (to see if they have some input)
>
> On Tue, 23 Apr 2024 at 22:02, Kamal Dasu <kamal.dasu@xxxxxxxxxxxx> wrote:
>>
>> When erase/trim/discard completion was converted to mmc_poll_for_busy(),
>> optional ->card_busy() host ops support was added. sdhci card->busy()
>> could return busy for long periods to cause mmc_do_erase() to block during
>> discard operation as shown below during mkfs.f2fs :
>>
>> Info: [/dev/mmcblk1p9] Discarding device
>> [ 39.597258] sysrq: Show Blocked State
>> [ 39.601183] task:mkfs.f2fs state:D stack:0 pid:1561 tgid:1561 ppid:1542 flags:0x0000000d
>> [ 39.610609] Call trace:
>> [ 39.613098] __switch_to+0xd8/0xf4
>> [ 39.616582] __schedule+0x440/0x4f4
>> [ 39.620137] schedule+0x2c/0x48
>> [ 39.623341] schedule_hrtimeout_range_clock+0xe0/0x114
>> [ 39.628562] schedule_hrtimeout_range+0x10/0x18
>> [ 39.633169] usleep_range_state+0x5c/0x90
>> [ 39.637253] __mmc_poll_for_busy+0xec/0x128
>> [ 39.641514] mmc_poll_for_busy+0x48/0x70
>> [ 39.645511] mmc_do_erase+0x1ec/0x210
>> [ 39.649237] mmc_erase+0x1b4/0x1d4
>> [ 39.652701] mmc_blk_mq_issue_rq+0x35c/0x6ac
>> [ 39.657037] mmc_mq_queue_rq+0x18c/0x214
>> [ 39.661022] blk_mq_dispatch_rq_list+0x3a8/0x528
>> [ 39.665722] __blk_mq_sched_dispatch_requests+0x3a0/0x4ac
>> [ 39.671198] blk_mq_sched_dispatch_requests+0x28/0x5c
>> [ 39.676322] blk_mq_run_hw_queue+0x11c/0x12c
>> [ 39.680668] blk_mq_flush_plug_list+0x200/0x33c
>> [ 39.685278] blk_add_rq_to_plug+0x68/0xd8
>> [ 39.689365] blk_mq_submit_bio+0x3a4/0x458
>> [ 39.693539] __submit_bio+0x1c/0x80
>> [ 39.697096] submit_bio_noacct_nocheck+0x94/0x174
>> [ 39.701875] submit_bio_noacct+0x1b0/0x22c
>> [ 39.706042] submit_bio+0xac/0xe8
>> [ 39.709424] blk_next_bio+0x4c/0x5c
>> [ 39.712973] blkdev_issue_secure_erase+0x118/0x170
>> [ 39.717835] blkdev_common_ioctl+0x374/0x728
>> [ 39.722175] blkdev_ioctl+0x8c/0x2b0
>> [ 39.725816] vfs_ioctl+0x24/0x40
>> [ 39.729117] __arm64_sys_ioctl+0x5c/0x8c
>> [ 39.733114] invoke_syscall+0x68/0xec
>> [ 39.736839] el0_svc_common.constprop.0+0x70/0xd8
>> [ 39.741609] do_el0_svc+0x18/0x20
>> [ 39.744981] el0_svc+0x68/0x94
>> [ 39.748107] el0t_64_sync_handler+0x88/0x124
>> [ 39.752455] el0t_64_sync+0x168/0x16c
>
> Thanks for the detailed log!
>
>>
>> Fix skips the card->busy() and uses MMC_SEND_STATUS and R1_STATUS
>> check for MMC_ERASE_BUSY busy_cmd case in the mmc_busy_cb() function.
>>
>> Fixes: 0d84c3e6a5b2 ("mmc: core: Convert to mmc_poll_for_busy() for erase/trim/discard")
>> Signed-off-by: Kamal Dasu <kamal.dasu@xxxxxxxxxxxx>
>> ---
>> drivers/mmc/core/mmc_ops.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>> index 3b3adbddf664..603fbd78c342 100644
>> --- a/drivers/mmc/core/mmc_ops.c
>> +++ b/drivers/mmc/core/mmc_ops.c
>> @@ -464,7 +464,8 @@ static int mmc_busy_cb(void *cb_data, bool *busy)
>> u32 status = 0;
>> int err;
>>
>> - if (data->busy_cmd != MMC_BUSY_IO && host->ops->card_busy) {
>> + if (data->busy_cmd != MMC_BUSY_IO &&
>> + data->busy_cmd != MMC_BUSY_ERASE && host->ops->card_busy) {
>> *busy = host->ops->card_busy(host);
>> return 0;
>> }
>
> So it seems like the ->card_busy() callback is broken in for your mmc
> host-driver and platform. Can you perhaps provide the information
> about what HW/driver you are using?
>
> The point with using the ->card_busy() callback, is to avoid sending
> the CMD13. Ideally it should be cheaper/faster and in most cases it
> translates to a read of a register. For larger erases, we would
> probably end up sending the CMD13 periodically every 32-64 ms, which
> shouldn't be a problem. However, for smaller erases and discards, we
> may want the benefit the ->card_busy() callback provides us.
>
> I would suggest that we first try to fix the implementation of the
> ->card_busy() callback for your HW. If that isn't possible or fails,
> then let's consider the approach you have taken in the $subject patch.

Note, sdhci drivers can override host->ops. For example,
sdhci-omap.c has:

host->mmc_host_ops.card_busy = sdhci_omap_card_busy;

Probably, if ->card_busy() cannot be supported, then setting
it to NULL would work.

host->mmc_host_ops.card_busy = NULL; /* Cannot detect card busy */